-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support uses of BACK that cause correlated references: setup decorrelation handling #251
Support uses of BACK that cause correlated references: setup decorrelation handling #251
Conversation
Co-authored-by: Hadia Ahmed <[email protected]>
Revision 2 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 3 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 4 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 5 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 6 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 7 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 8 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 9 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 10 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 11 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 12 Co-authored-by: Hadia Ahmed <[email protected]>
Revision 13 Co-authored-by: Hadia Ahmed <[email protected]>
if levels == 0: | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a useful trick to have rather than having to case on this behavior elsewhere, so x.shift_back(0)
returns x
.
@@ -895,6 +897,7 @@ def __init__( | |||
self._is_connection_root: bool = is_connection_root | |||
self._agg_keys: list[HybridExpr] | None = None | |||
self._join_keys: list[tuple[HybridExpr, HybridExpr]] | None = None | |||
self._correlated_children: set[int] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of indices indicates which of the elements in _children
contain a correlated reference specifically to self
. This is important because only those children should have the de-correlation procedure run on them, and only if they are of a certain type.
case HybridRefExpr(): | ||
parent_result = HybridBackRefExpr( | ||
parent_result.expr.name, 1, parent_result.typ | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This weird extra-shifting trick is no longer needed, and was actually incorrect.
if not isinstance(parent_result, HybridCorrelExpr): | ||
parent_tree.correlated_children.add(len(parent_tree.children)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how we know which children of the hybrid tree are correlated. If parent_result
is a correl expr, it means that the correlated reference is actually to the tree containing parent_tree
.
@@ -1,8 +1,10 @@ | |||
ROOT(columns=[('name', name), ('n_prefix_nations', n_prefix_nations)], orderings=[(ordering_1):asc_first]) | |||
PROJECT(columns={'n_prefix_nations': n_prefix_nations, 'name': name, 'ordering_1': name}) | |||
PROJECT(columns={'n_prefix_nations': DEFAULT_TO(agg_0, 0:int64), 'name': name}) | |||
JOIN(conditions=[t0.key == t1.region_key], types=['left'], columns={'agg_0': t1.agg_0, 'name': t0.name}, correl_name='corr1') | |||
JOIN(conditions=[t0.key == t1.key], types=['left'], columns={'agg_0': t1.agg_0, 'name': t0.name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "simple" example of how the de-correaltion works.
- The left subtree (region) became a part of the right subtree (nation) before the correlated reference (the filter) occurred. This results in selecting region twice, but that's an efficiency problem for another day.
- The join is now on the uniquness keys of the left subtree: the region's
.key
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general thoughts about hashes and equality.
): | ||
# These patterns do not require decorrelation since they | ||
# are supported via correlated SEMI/ANTI joins. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a default "case _:" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because these are exhaustive with regards to the enum. If any new case is generated, mypy will throw an exception because these branches are no longer exhaustive. So here, we can rely on mypy tooling to ensure quality.
return self.equals(other) | ||
|
||
def __hash__(self) -> int: | ||
return hash(repr(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any other way we could generate this hash? If someone changes repr inadvertently somewhere else it could break things. Not sure it is something we should be worried about at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a better way off the top of my head since the best way to hash would be in a manner that depends on the name of the function/operator, which is always inside of its repr string (but each class handles that naming differently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few Typos, but LGTM!
Child PR of #269 that is part of addressing #141. Adds the handling of de-correlation as a post-processing step of the hybrid tree before relational conversion starts. Handles the singular & aggregate cases as described in the issue, and handles the semi-singular and semi-aggregate cases in the same manner as their non-semi equivalents (for now), saving the optimized variants for another day.
These remaining still need to be addressed in the next follow up: