-
Notifications
You must be signed in to change notification settings - Fork 3
Adding optimization rewrite pass to utilize server with information about masked columns #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ptbank_filter_count_01
| [ | ||
| MaskServerInput( | ||
| table_path="srv.db.tbl", | ||
| table_path="db.tbl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The srv. part is added elsewhere
| self.stack.clear() | ||
|
|
||
| def visit_call_expression(self, expr: CallExpression) -> None: | ||
| # TODO: ADD COMMENTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO: ADD COMMENTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments regarding IN and ISIN operators and a type hint
| mapping each such operator to the string name used in the linear string | ||
| serialization format recognized by the Mask Server. | ||
| Note: ISIN is handled separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the operators used in the mock server? If so, we should add IN and NOT_IN (can be found in the lookup table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the operators used in the real server (which the mock server should emulate). And the point isn't to include all of their operators (e.g. we don't do regex), its to include all of the mappings from our operators to theirs. ISIN is handled separately from this mapping, and we don't currently use NOT_ISIN at all, we just do ISIN and sometimes wrap the result in a NOT call. There is no operator in PyDough which maps to NOT_ISIN.
| in_list.extend(literal_list) | ||
|
|
||
| # The result list is: | ||
| # 1. The operator name "IN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused regarding the operators. Are we using the mask_server operator (IN and NOT_IN) or ISIN? I see sometimes IN and other times ISIN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how we convert PyDough relational terms to the mask server terms. Specifically, we are converting a call of ISIN to to the mask server operator IN. If the PyDough code is NOT(ISIN(...)), then the list returned by this function will get wrapped in a NOT call.
…ter/hour/minute/second, coalesce,iff, join_strings, smallest/largest, and abs
… handled cases where the in/not in list contains a NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let the approval to @hadia206 and @john-sanchez31
| table_path="srv.db.orders", | ||
| table_path="db.orders", | ||
| column_name="order_date", | ||
| expression=["BETWEEN", 3, "__col__", "2025-01-01", "2025-02-01"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to test the use of QUOTE, for values using predicate reserved words like an OP name or __col__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test values having single-quote, double-quote, comma, square brackets and curly braces
Augmenting relational optimization to rewrite expressions containing an UNMASK operator when a server is mounted to the PyDough session (and the environment variable is activated):
additional_shuttleslest, before the masking literal comparisons shuttle.MaskServerCandidateShuttleis a no-op shuttle that just traverses the entire tree to find expressions that can potentially be rewritten and adds them to a pool.MaskServerRewriteShuttlelooks for expressions in the candidate shuttle's pool, and once it finds one it sends every candidate in the pool into a batch request to the mask server, processing the output results to create the new relational node. The candidate pool is then emptied so future invocations will not re-do the same batch calculation.