-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RFC for access control row filtering and column masking #34
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
# **RFC-0010 for Presto** | ||
|
||
See [CONTRIBUTING.md](CONTRIBUTING.md) for instructions on creating your RFC and the process surrounding it. | ||
|
||
## [Access Control Row Filters and Column Masks] | ||
|
||
Proposers | ||
|
||
* Tim Meehan | ||
* Bryan Cutler | ||
|
||
## [Related Issues] | ||
|
||
Current Proposal <br> | ||
https://github.com/prestodb/presto/issues/24278 | ||
|
||
Past Discussions and PRs | ||
<br> | ||
https://github.com/prestodb/presto/issues/20572 | ||
<br> | ||
https://github.com/prestodb/presto/issues/19041 | ||
<br> | ||
https://github.com/prestodb/presto/pull/21913 | ||
<br> | ||
https://github.com/prestodb/presto/pull/18119 | ||
|
||
## Summary | ||
|
||
Add access control support for row filtering and column masking, and apply them to queries by rewriting the plan. | ||
|
||
## Background | ||
|
||
As a part of governance requirements, Presto is needed to support data compliance coming from a set of defined rules. For a | ||
given query, these rules are made into expressions for row filters and column masks. A row filter is used to prevent display of | ||
certain rows with data the user does not have access to, while allowing remaining rows that are allowed. A column mask can be | ||
used to mask or obfuscate sensitive data that a user is forbidden to view, such as a credit card number. The expressions can | ||
then be applied to the query during a rewrite before it is run. | ||
|
||
### [Optional] Goals | ||
|
||
* Add SPIs to AccessControl classes to allow retrieval of row filters and column masks | ||
* Add functionality for Presto to apply filters and masks to a query | ||
|
||
## Proposed Implementation | ||
|
||
The proposed implementation follows the design from TrinoDB. The filters and masks are retrieved in the `StatementAnalyzer` | ||
and the query is rewritten with the `RelationPlanner`. For reference, orginal TrinoDB commits: | ||
|
||
* Adding support for row filters trinodb/trino@fae3147 | ||
* Adding support for column masking trinodb/trino@7e0d88e | ||
|
||
### Core SPI | ||
|
||
Add methods to access control interfaces to retrieve a list of row filters and columns masks for all relevant columns in a | ||
table. | ||
|
||
AccessControl.java | ||
|
||
```java | ||
default List<ViewExpression> getRowFilters(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName) | ||
{ | ||
return Collections.emptyList(); | ||
} | ||
|
||
default Map<ColumnMetadata, ViewExpression> getColumnMasks(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName, List<ColumnMetadata> columns) | ||
{ | ||
return Collections.emptyMap(); | ||
} | ||
``` | ||
|
||
ConnectorAccessControl.java | ||
|
||
```java | ||
/** | ||
* Get row filters associated with the given table and identity. | ||
* <p> | ||
* Each filter must be a scalar SQL expression of boolean type over the columns in the table. | ||
* | ||
* @return the list of filters, or empty list if not applicable | ||
*/ | ||
default List<ViewExpression> getRowFilters(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName) | ||
{ | ||
return Collections.emptyList(); | ||
} | ||
|
||
/** | ||
* Bulk method for getting column masks for a subset of columns in a table. | ||
* <p> | ||
* Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression | ||
* must be written in terms of columns in the table. | ||
* | ||
* @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}. | ||
*/ | ||
default Map<ColumnMetadata, ViewExpression> getColumnMasks(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName, List<ColumnMetadata> columns) | ||
{ | ||
return Collections.emptyMap(); | ||
} | ||
``` | ||
|
||
SystemAccessControl.java | ||
```java | ||
|
||
/** | ||
* Get row filters associated with the given table and identity. | ||
* <p> | ||
* Each filter must be a scalar SQL expression of boolean type over the columns in the table. | ||
* | ||
* @return a list of filters, or empty list if not applicable | ||
*/ | ||
default List<ViewExpression> getRowFilters(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName) | ||
{ | ||
return Collections.emptyList(); | ||
} | ||
|
||
/** | ||
* Bulk method for getting column masks for a subset of columns in a table. | ||
* <p> | ||
* Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression | ||
* must be written in terms of columns in the table. | ||
* | ||
* @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}. | ||
*/ | ||
default Map<ColumnMetadata, ViewExpression> getColumnMasks(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName, List<ColumnMetadata> columns) | ||
{ | ||
return Collections.emptyMap(); | ||
} | ||
``` | ||
|
||
ViewExpression class to hold a filter/mask expression | ||
|
||
```java | ||
public ViewExpression(String identity, Optional<String> catalog, Optional<String> schema, String expression) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some questions about
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the clarifications. As for the schema evolution bits: In Iceberg you have you could perform the following set of operations CREATE TABLE test(i int)
ALTER TABLE test RENAME COLUMN i TO j So imagine you have a column filter say, If you try to query the table at this point, will the query just fail because there is no longer an Similarly, if you think in more malicious terms, you could do something like: CREATE TABLE test(i int)
ALTER TABLE test RENAME COLUMN i TO j
ALTER TABLE test ADD COLUMN i From my understanding, this would now allow the user to view data which would have previously been masked if the filters aren't updated properly. In iceberg, you can also query old versions of tables with the previous schema CREATE TABLE test(i int) -- say, version A
ALTER TABLE test RENAME COLUMN i TO j -- say, version B
-- Adjust filter to j % 10 now
SELECT * FROM "test@B" Would this result in a failure to query older versions if the masks/row filters don't apply? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think most of this is beyond the scope of this SPI and more related to how a specific implementation of access control policy might work. If a user has restricted access to a column, then any sane policy would not allow them to alter the table on it, and stop them right there. These are good questions though and I don't know if all malicious cases can be covered. What we want to get right is that this SPI can provide the necessary info for a specific implementation to handle these cases. Having some example implementations in the code base could help make sure these types of actions would be covered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that a sane policy would stop them, but what if it was done by accident? Or someone forgets to update the policy? Like you said - I guess it would technically be up to the plugin implementation to determine the correct filter/view expression. I am just trying to understand the responsibilities of Presto vs Plugin. Also, some more iceberg specific comments: For iceberg tables - in order to determine which schema (and potentially columns to use) you would need to pass the entire table name to the function. However, the concern I have is that within the Iceberg plugin we have connector-specific logic for parsing the table name to determine the table version, which in-turn, helps us determine the schema and table version we query in the Iceberg connector. For example, an Iceberg table queried at time This allows querying iceberg tables via SYSTEM_TIME or SYSTEM_VERSION. I would consider adding the version/timestamp information in the query to the plugin SPI as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The responsibility of Presto is to supply the qualified object name, context and other info to the access control plugin. It's not responsible to manage and update the filters/masks on the query, that needs to be done by the plugin. I checked an Iceberg example that uses some versioning and this is what is in the
It looks like all the information is there, the access control would need to interpret it correctly though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so my concern is that we get So for this specific case I think there's two things that the interface needs:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You can implement ConnectorAccessControl inside connector. So access plugin inside connector can handle table schema and policy evolution.
Will each table type have different schema and policy? Can we treat tables types as different tables?
Good observation. The same problem exists for other table level methods like AccessControl#checkCanSelectFromColumns |
||
``` | ||
|
||
Analysis.java will hold filters and masks for the table with additional methods | ||
|
||
```java | ||
void registerTableForRowFiltering(QualifiedObjectName table, String identity) | ||
|
||
boolean hasRowFilter(QualifiedObjectName table, String identity) | ||
|
||
void addRowFilter(Table table, Expression filter) | ||
|
||
List<Expression> getRowFilters(Table node) | ||
|
||
void registerTableForColumnMasking(QualifiedObjectName table, String column, String identity) | ||
|
||
boolean hasColumnMask(QualifiedObjectName table, String column, String identity) | ||
|
||
void addColumnMask(Table table, String column, Expression mask) | ||
|
||
Map<String, Expression> getColumnMasks(Table table) | ||
``` | ||
#### Example expressions | ||
|
||
Examples of row filter expressions, given a table `orders` with columns `orderkey`, `nationkey`: | ||
|
||
- a simple predicate: | ||
``` | ||
expression := "orderkey < 10" | ||
``` | ||
|
||
- a subquery: | ||
``` | ||
expression := "EXISTS (SELECT 1 FROM nation WHERE nationkey = orderkey)" | ||
``` | ||
|
||
A column mask will apply an operation on a specific column, given the column values as input produce the masked output. | ||
- example to nullify values, whatever column this is applied to will produce a NULL value: | ||
``` | ||
expression := "NULL" | ||
``` | ||
|
||
- example to negate a column integer values, when applied to column "custkey": | ||
``` | ||
expression := "-custkey" | ||
``` | ||
|
||
#### Additional information | ||
1. What modules are involved | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
- `presto-main` | ||
- `presto-spi` | ||
- `presto-analyzer` to hold masks and filters | ||
- `presto-hive` for legacy and sql access control | ||
2. Any new terminologies/concepts/SQL language additions | ||
- NA | ||
3. Method/class/interface contracts which you deem fit for implementation. | ||
- NA | ||
4. Code flow using bullet points or pseudo code as applicable | ||
- During analysis phase, access control apis used to retrieve and analyze row filters and column masks. | ||
- Analyzed filters and masks are stored in `Analysis`. | ||
- `RelationPlanner` will the get the filters and masks from `Analysis` and rewrite a new plan with them applied. | ||
- By default no filters or masks are added and the plan will not be rewritten. | ||
5. Any new user facing metrics that can be shown on CLI or UI. | ||
- NA | ||
|
||
## [Optional] Metrics | ||
|
||
This is a 0 to 1 feature and will not have any metrics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to have a RuntimeMetric or field in the query info that tracks how many (a count) row filters/column masks were applied to a TableScan There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that sounds like it could be useful. But the access control plugin will know exactly what filters and masks are being applied, so couldn't that be done from within the plugin? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if we have both system and catalog level filters, we would be better off having a system-level metric that capture's all applied filters/masks Also, at the moment, we do not pass/create a I'm okay with having metrics for this feature as a non-goal; these can be added later |
||
|
||
## [Optional] Other Approaches Considered | ||
|
||
Discussed at https://github.com/prestodb/presto/pull/21913#issuecomment-2050279419 is an approach to use the existing SPI for connector | ||
optimization to rewrite plan during the optimization phase. The benefits of the proposed design over this approach is that it applies | ||
globally to all connectors. Since it is an existing design that has already been in use, it is known to be working, stable and | ||
conforms with the Trino SPI which will help to ease migration. | ||
|
||
## Adoption Plan | ||
|
||
- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar? | ||
- No impact to users. SPI additions will include a default to keep exiting behaviour. AccessControl plugin can be used to enable the functionality. | ||
- If we are changing behaviour how will we phase out the older behaviour? | ||
- NA | ||
- If we need special migration tools, describe them here. | ||
- NA | ||
- When will we remove the existing behaviour, if applicable. | ||
- NA | ||
- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed? | ||
- This feature will be documented in the Presto documentation. | ||
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? | ||
- NA | ||
|
||
## Test Plan | ||
|
||
Unit tests will be added to ensure that row filter and column mask expressions can be added to a query and give the expected result. The | ||
`TestingAccessControlManager` will be modified to allow for addition of row filters and column masks to be used in testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a lot of methods for retrieving the filters and masks, but will we support any API for setting them through Presto? Or will it solely be done through some kind of configuration/external service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for the SPI changes initially. If approved, it would be easy to port over some existing implementations that are compatible, like Ranger https://github.com/trinodb/trino/blob/9499dc82f2d23314dbc76b0443bedd121e6400eb/plugin/trino-ranger/src/main/java/io/trino/plugin/ranger/RangerSystemAccessControl.java#L822