Skip to content

Conversation

@knassre-bodo
Copy link
Contributor

@knassre-bodo knassre-bodo commented Oct 9, 2025

Augmenting relational optimization to rewrite expressions containing an UNMASK operator when a server is mounted to the PyDough session (and the environment variable is activated):

  • When this is the case, two additional shuttles are added to the additional_shuttles lest, before the masking literal comparisons shuttle.
  • The first shuttle, MaskServerCandidateShuttle is a no-op shuttle that just traverses the entire tree to find expressions that can potentially be rewritten and adds them to a pool.
  • The second shuttle, MaskServerRewriteShuttle looks for expressions in the candidate shuttle's pool, and once it finds one it sends every candidate in the pool into a batch request to the mask server, processing the output results to create the new relational node. The candidate pool is then emptied so future invocations will not re-do the same batch calculation.

[
MaskServerInput(
table_path="srv.db.tbl",
table_path="db.tbl",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The srv. part is added elsewhere

@knassre-bodo knassre-bodo marked this pull request as ready for review October 16, 2025 19:08
@knassre-bodo knassre-bodo requested review from a team, hadia206, john-sanchez31 and juankx-bodo and removed request for a team October 16, 2025 19:09
self.stack.clear()

def visit_call_expression(self, expr: CallExpression) -> None:
# TODO: ADD COMMENTS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: ADD COMMENTS

Copy link
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments regarding IN and ISIN operators and a type hint

mapping each such operator to the string name used in the linear string
serialization format recognized by the Mask Server.
Note: ISIN is handled separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the operators used in the mock server? If so, we should add IN and NOT_IN (can be found in the lookup table)

Copy link
Contributor Author

@knassre-bodo knassre-bodo Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the operators used in the real server (which the mock server should emulate). And the point isn't to include all of their operators (e.g. we don't do regex), its to include all of the mappings from our operators to theirs. ISIN is handled separately from this mapping, and we don't currently use NOT_ISIN at all, we just do ISIN and sometimes wrap the result in a NOT call. There is no operator in PyDough which maps to NOT_ISIN.

in_list.extend(literal_list)

# The result list is:
# 1. The operator name "IN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused regarding the operators. Are we using the mask_server operator (IN and NOT_IN) or ISIN? I see sometimes IN and other times ISIN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we convert PyDough relational terms to the mask server terms. Specifically, we are converting a call of ISIN to to the mask server operator IN. If the PyDough code is NOT(ISIN(...)), then the list returned by this function will get wrapped in a NOT call.

…ter/hour/minute/second, coalesce,iff, join_strings, smallest/largest, and abs
… handled cases where the in/not in list contains a NULL
Copy link
Contributor

@juankx-bodo juankx-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let the approval to @hadia206 and @john-sanchez31

table_path="srv.db.orders",
table_path="db.orders",
column_name="order_date",
expression=["BETWEEN", 3, "__col__", "2025-01-01", "2025-02-01"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test the use of QUOTE, for values using predicate reserved words like an OP name or __col__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test values having single-quote, double-quote, comma, square brackets and curly braces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants