hive.metastore.glue.security-mapping.config-file
: Path to JSON configuration file containing glue security mappings
+ 2. hive.metastore.glue.security-mapping.refresh-period
: Time interval after which securing mapping configuration will be refreshed
+2. The mapping entries will be processed in the order they are listed in the configuration file. If no mapping entry matches and no default is configured, the access is denied.
+3. The mapping entries will be cached in memory and refreshed after a configurable time period.
+
+An example JSON configuration file
+
+```json
+{
+"mappings": [
+ {
+ "user": "imjalpreet_read_only",
+ "iamRole": "arn:aws:iam::XXXX:role/imjalpreet_read_only"
+ },
+ {
+ "user": "imjalpreet_admin",
+ "iamRole": "arn:aws:iam::XXXX:role/imjalpreet_admin"
+ },
+ {
+ "iamRole": "arn:aws:iam::XXXX:role/imjalpreet_default"
+ }
+]
+}
+```
+
+💡 All IAM Roles which will be mapped to users must at a minimum include lakeformation:GetDataAccess permission and AWSGlueConsoleFullAccess Managed Policy in their attached policies.
+
+[Back to top](#rfc-0004-for-presto-and-aws-lake-formation-integration)
+
+#### 2.3.2 Proposed modifications to implement Glue Metastore Impersonation
+
+In the current design, up until now, all Glue interactions were being done using a globally configured IAM role. This did not require the presence of User Identity while interacting with Glue since all API calls were made using the same IAM role irrespective of the user.
+
+But now with the upcoming modifications and introduction of AWS Security Mapping, we would require the presence of User Identity while making Glue API calls.
+
+To make this possible, we will be extending the support of metastore impersonation in Presto to Glue integration as well.
+
+In the current `GlueHiveMetastore` implementation, a glue client is created only once in the constructor method which uses STS Assume Role Credentials Provider for the globally configured IAM role.
+
+With the introduction of MetastoreContext and AWS Security Mapping, we would need to make certain modifications in the above implementation. After exploring the APIs in AWS SDK, it was figured out that the base AWS SDK request, i.e. `AmazonWebServiceRequest` supports setting a credential provider per request(`setRequestCredentialsProvide()` or `withRequestCredentialsProvider()`). The credentials provider that was set while creating the client is only used if no credentials provider is set in the request object.
+
+We can leverage the Request Credentials Provider to achieve our use case.
+
+💡 We will cache the Credentials Provider for each identity until there is a change in the AWS Security Mapping.
+
+In addition to the above change, we would also need to add a session tag `LakeFormationAuthorizedCaller=**hive.metastore.glue.impersonation.enabled**
: Should end-user be impersonated when communicating with the Hive Glue Metastore
+2. If glue impersonation is enabled, a per-request STS Assume Role Credentials Provider will be created. With the help of Glue Security Mapping and Hive Identity, the role to assume will be figured out.
+
+How to create STSAssumeRoleSessionCredentialsProvider along with Session Tags?
+
+```
+// How to create a session tag?
+Tag tag = new Tag().withKey(lakeFormationPartnerTagName).withValue(lakeFormationPartnerTagValue);
+
+return new STSAssumeRoleSessionCredentialsProvider
+ .Builder(iamRole, "roleSessionName")
+ .withSessionTags(tags)
+ .build();
+
+// tags contains a session tag LakeFormationAuthorizedCaller=clientTag
+```
+
+The below configuration properties will be added to externalize the session tag key and value.
+
+1. **hive.metastore.glue.lakeformation.partner-tag-name**
: Name of the partner tag in AWS Lake Formation
+2. **hive.metastore.glue.lakeformation.partner-tag-value**
: Value of the partner tag in AWS Lake Formation's authorized partner list
+
+💡 Passing a session tag while assuming an IAM role requires an additional AWS permission (sts:TagSession
) to be added to the trust policy of the IAM role.
+
+ Sample Trust Policy
+
+```json
+{
+ "Version": "2012-10-17",
+ "Statement": [
+ {
+ "Effect": "Allow",
+ "Principal": {
+ "AWS": [
+ "arn:aws:iam::789986721738:user/imjalpreet-glue",
+ "arn:aws:iam::789986721738:root"
+ ]
+ },
+ "Action": [
+ "sts:AssumeRole",
+ "sts:TagSession"
+ ],
+ "Condition": {}
+ }
+ ]
+}
+```
+
+3. The credential provider created in the previous step will be used to interact with the respective Glue API.
+
+ **How to set per-request credentials provider?**
+
+ ```
+ // An example for the getDatabase call
+ glueClient.getDatabase(new GetDatabaseRequest()
+ .withCatalogId(catalogId)
+ .withName(databaseName)
+ .withRequestCredentialsProvider(credentialsProvider));
+ ```
+
+[Back to top](#rfc-0004-for-presto-and-aws-lake-formation-integration)
+
+#### 2.3.3 Add support for metadata restriction in Presto
+
+Currently, Presto allows all metadata queries irrespective of whether the user can access data in the table or not. This is due to no support for callbacks required for metadata queries like `SHOW CREATE TABLE`, `SHOW COLUMNS` or `DESCRIBE` in the current Connector Access Control or System Access Control SPI.
+
+AWS Lake Formation expects the partners to also restrict access to metadata in addition to data. To support metadata restriction, we will be introducing new callback methods in Connector Access Control as well as System Access Control SPI. These methods will then be implemented for different Access Control plug-ins available in Presto adhering to the respective policies/permissions defined in each of them.
+
+Below are the new access control methods that will be added to the SPI as part of this integration:
+
+* Restrict access for `SHOW CREATE TABLE` queries
+ ```java
+ /**
+ * Check if identity is allowed to execute SHOW CREATE TABLE or SHOW CREATE VIEW.
+ *
+ * @throws com.facebook.presto.spi.security.AccessDeniedException if not allowed
+ */
+ void checkCanShowCreateTable(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName);
+ ```
+
+* Restrict access for `SHOW COLUMNS` and `DESCRIBE` queries
+ ```java
+ /**
+ * Check if identity is allowed to show columns of tables by executing SHOW COLUMNS,
+ * DESCRIBE etc.
+ *
+ * NOTE: This method is only present to give users an error message when listing is not allowed.
+ * The {@link #filterColumns} method must filter all results for unauthorized users,
+ * since there are multiple ways to list columns.
+ *
+ * @throws com.facebook.presto.spi.security.AccessDeniedException if not allowed
+ */
+ void checkCanShowColumnsMetadata(TransactionId transactionId, Identity identity, AccessControlContext context, CatalogSchemaTableName table);
+ ```
+
+* Callback method to filter columns to those visible to the identity
+ ```java
+ /**
+ * Filter the list of columns to those visible to the identity.
+ */
+ List**hive.metastore.glue.lakeformation.policy-cache-ttl**
+ * **hive.metastore.glue.lakeformation.policy-refresh-period**
+ * **hive.metastore.glue.lakeformation.policy-refresh-max-threads**
+getUnfilteredTableMetadata
. With the help of Hive Identity, Glue Security Mapping and trusted partner tag, we can call this API. The response contains both the metadata of the table and policies defined in Lake Formation (including row filters). The policies fetched for each table will be cached and refreshed after a configurable time.
+ ```java
+ /**
+ * @param getUnfilteredTableMetadataRequest
+ * @return Result of the GetUnfilteredTableMetadata operation returned by the service.
+ * @throws EntityNotFoundException
+ * A specified entity does not exist
+ * @throws InvalidInputException
+ * The input provided was not valid.
+ * @throws InternalServiceException
+ * An internal service error occurred.
+ * @throws OperationTimeoutException
+ * The operation timed out.
+ * @throws GlueEncryptionException
+ * An encryption operation failed.
+ * @throws PermissionTypeMismatchException
+ * @sample AWSGlue.GetUnfilteredTableMetadata
+ * @see
+ * AWS API Documentation
+ */
+ GetUnfilteredTableMetadataResult getUnfilteredTableMetadata(GetUnfilteredTableMetadataRequest getUnfilteredTableMetadataRequest);
+ ```
+
+* Based on the policies fetched, access will be granted or denied for the respective SQL statement. **hide-unauthorized-columns**
: When enabled unauthorized columns are silently filtered from results of `SELECT *` statements. This property has to be set in config.properties
+* We will also be adding a system session property to control the behavior if it’s not set in config.properties
. The session property will be named hide_unauthorized_columns
+
+The changes will be implemented during the semantic analysis phase where relations are being analyzed in com.facebook.presto.sql.analyzer.StatementAnalyzer.Visitor#analyzeSelect
+
+and also where output scope is being created in
+
+`com.facebook.presto.sql.analyzer.StatementAnalyzer.Visitor#computeAndAssignOutputScope`.
+
+After these changes are added, when the feature is enabled and a user tries to submit a `SELECT * `query, the unauthorized columns will be filtered out from the output results. In case a user tries to direct access a single column on which the user doesn’t have access, it will throw an `AccessDeniedException` like it does in the current implementation.
+
+[Back to top](#rfc-0004-for-presto-and-aws-lake-formation-integration)
+
+## 3.0 Operational Considerations
+
+### 3.1 Add Presto as a Truster Partner in User’s AWS Lake Formation
+
+💡 Pre-requisite: The Credential Vending APIs in AWS Lake Formation and `getUnfilteredTableMetadata` API in AWS Glue are only allowed to be called by trusted partners. Presto must be added as a trusted partner by an admin in Customer's Lake Formation. As depicted in the above design, this can be done using the API `PutDataLakeSettings(clientTag)` (clientTag here is a placeholder for the partner tag)
+
+The above pre-requisite can be enabled in the following way (Remember that this needs to be done by the user):
+
+* Fetch the current Data Lake Settings using the below AWS Lake Formation API
+ Property Name + | +Description + | +Default + | +Location + | +
hive.security=lake-formation
+ |
+ Name of the security system to use for authorization checks + | ++ | +Catalog Config + | +
hive.metastore.glue.impersonation.enabled
+ |
+ Should end user be impersonated when communicating with the Hive Glue Metastore + | +false + | +Catalog Config + | +
hive.metastore.glue.security-mapping.config-file
+ |
+ JSON configuration file containing AWS security mappings + | ++ | +Catalog Config + | +
hide-unauthorized-columns
+ |
+ When enabled unauthorized columns are silently filtered from results of SELECT * statements + | +false + | +config.properties + | +
hive.metastore.glue.security-mapping.refresh-period
+ |
+ Time interval after which securing mapping configuration will be refreshed + | +30 seconds + | +Catalog Config + | +
hive.metastore.glue.lakeformation.partner-tag-name
+ |
+ Name of the partner tag in AWS Lake Formation + | +LakeFormationAuthorizedCaller + | +Catalog Config + | +
hive.metastore.glue.lakeformation.partner-tag-value
+ |
+ Value of the partner tag in AWS Lake Formation's authorized partner list + | +presto + | +Catalog Config + | +
hive.metastore.glue.lakeformation.policy-cache-ttl
+ |
+ Time after which policies will be cleared from Cache + | +120 minutes + | +Catalog Config + | +
hive.metastore.glue.lakeformation.policy-refresh-period
+ |
+ Time interval after which cached policies will be refreshed + | +5 minutes + | +Catalog Config + | +
hive.metastore.glue.lakeformation.policy-refresh-max-threads
+ |
+ Max number of refresh threads + | +1 + | +Catalog Config + | +
hive.metastore.glue.lakeformation.supported-permission-type
+ |
+ Type of AWS Lake Formation permissions supported. COLUMN_PERMISSION or CELL_FILTER_PERMISSION
+ |
+ CELL_FILTER_PERMISSION + | +Catalog Config + | +