-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BSE-4155] Add support for the Join relational node #42
Conversation
super().__init__(data_type) | ||
self._name: str = name | ||
self._input_name: str | None = input_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should allow join to generate input specific columns in the conversion using "left" or "right".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not doing full qualification (just "does it come from the LHS or RHS of the join) does that mean we must insert naming/pruning projections after every join so that the left/right sides are unified afterwards? Otherwise, if there are subsequent joins, how will we know what "left" refers to, if "left" also has its own "left" and "right"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No these are purely ephemeral for translating your input and wouldn't be represented in the output. My thought was that we just need to be able to resolve relative to the node and then lowering could optionally translate this into the SQL.
So for example, imagine we opted to have join always use left and right. Then we could execute the following steps:
- Assign the inputs the join the aliases left and right.
- Generate expressions using these aliases.
- Generate the output which won't depend on any table alias.
Alternatively the lowering could only generate aliases when its necessary or rather than doing left/right every time could generate unique aliases to map left/right to, possibly via a counter. The exact procedure isn't important so long as we can tell the origin for every column where it may be ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments, but overall LGTM.
pydough/relational/join.py
Outdated
class JoinType(Enum): | ||
INNER = "inner" | ||
LEFT = "left" | ||
RIGHT = "right" | ||
FULL_OUTER = "full outer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include semi
and anti
, bc PyDough will have convenience syntax for HAS
and HASNOT
, and we can just fast-track them to that path + deal with the specifics of the translation in the SqlGlot step.
SQLGlot does have join kinds for semi/anti, and does the EXIST
translation step for us if the dialect doesn't allow them (https://github.com/tobymao/sqlglot/blob/79f67830d7d3ba92bff91eeb95b4dc8bdfa6c44e/sqlglot/dialects/sqlite.py#L199).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of what I see this looking like:
- Suppose we have
S.WHERE(HAS(T) & p1 & p2)
. That becomesJOIN(semi, S, T).WHERE(p1 & p2)
- Suppose we have
S.WHERE(HASNOT(T) & p1 & p2)
. That becomesJOIN(anti, S, T).WHERE(p1 & p2)
- If
HAS
is used in any other patterns besides in the conjunction of aWHERE
, it is expanded toCOUNT(T) > 0
, andHASNOT
intoCOUNT(T) == 0
, then translated accordingly.
We could even make the expansion happen in the AST phase, so all HAS/HASNOT present during AST->Relational translation must be semi/anti joins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this when we have the syntax.
super().__init__(data_type) | ||
self._name: str = name | ||
self._input_name: str | None = input_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not doing full qualification (just "does it come from the LHS or RHS of the join) does that mean we must insert naming/pruning projections after every join so that the left/right sides are unified afterwards? Otherwise, if there are subsequent joins, how will we know what "left" refers to, if "left" also has its own "left" and "right"
Adds support for the basic definition of the Join relational node.