Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uses of BACK that cause correlated references: fix remaining decorrelation edge cases #254

Merged
merged 166 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
deeb914
Starting function list documentation
knassre-bodo Jan 13, 2025
5d6c513
Adding datetime functions and bad boolean tests
knassre-bodo Jan 13, 2025
170492e
Merge branch 'main' into kian/function_docs
knassre-bodo Jan 13, 2025
8b8c098
Adding remaining functions including agg/window
knassre-bodo Jan 13, 2025
cb83f1f
Adding toc
knassre-bodo Jan 13, 2025
ef39e6e
Adding toc
knassre-bodo Jan 13, 2025
3c79543
Fixing typo [RUN CI]
knassre-bodo Jan 13, 2025
3db13d1
Started DSL documentation
knassre-bodo Jan 13, 2025
8dc17b3
Adding calc, contextless, and back
knassre-bodo Jan 13, 2025
044c217
Added TOC and TODOs
knassre-bodo Jan 13, 2025
0252cee
Added TOC and TODOs
knassre-bodo Jan 13, 2025
98c722b
Changing highlighting
knassre-bodo Jan 13, 2025
aa86d31
Merge branch 'kian/function_docs' into kian/dsl_docs
knassre-bodo Jan 13, 2025
b03a5ec
Addded more examples
knassre-bodo Jan 13, 2025
b569bfb
Added WHERE, ORDER_BY, and TOP_K documentation
knassre-bodo Jan 14, 2025
7607fea
Fixing typo
knassre-bodo Jan 14, 2025
215735b
Added extra example
knassre-bodo Jan 14, 2025
0b8e1fb
Update pydough/unqualified/unqualified_node.py
knassre-bodo Jan 14, 2025
a5190f8
Update documentation/functions.md
knassre-bodo Jan 14, 2025
a0d43f3
Update documentation/functions.md
knassre-bodo Jan 14, 2025
24c1e4a
Update documentation/functions.md
knassre-bodo Jan 14, 2025
8009e05
Update documentation/functions.md
knassre-bodo Jan 14, 2025
5d87b59
Update documentation/functions.md
knassre-bodo Jan 14, 2025
a3ee536
Update documentation/functions.md
knassre-bodo Jan 14, 2025
9945752
Update documentation/functions.md
knassre-bodo Jan 14, 2025
4c6aa79
Update documentation/functions.md
knassre-bodo Jan 14, 2025
e60dcc3
Update documentation/functions.md
knassre-bodo Jan 14, 2025
a23c395
Update documentation/functions.md
knassre-bodo Jan 14, 2025
2515683
Update documentation/functions.md
knassre-bodo Jan 14, 2025
d34c64c
Update documentation/functions.md
knassre-bodo Jan 14, 2025
cd59dc2
Update documentation/functions.md
knassre-bodo Jan 14, 2025
92778e3
Update documentation/functions.md
knassre-bodo Jan 14, 2025
6003ed0
Update documentation/functions.md
knassre-bodo Jan 14, 2025
fb86e3a
Update documentation/functions.md
knassre-bodo Jan 14, 2025
ca6b653
Updating arithmetic documentaiton and LIKE link
knassre-bodo Jan 14, 2025
22dfb32
Updating numerical operator warning
knassre-bodo Jan 14, 2025
e550ae4
Added function list checking test and 3 missing functions
knassre-bodo Jan 14, 2025
f7af1a9
Updated some explanations
knassre-bodo Jan 14, 2025
3e0b62c
Merge branch 'kian/function_docs' into kian/dsl_docs
knassre-bodo Jan 14, 2025
6a66c5f
Started PARTITION docs, still need to add a few more bad examples
knassre-bodo Jan 14, 2025
19cfd47
Added expressions, more bad partition examples, and NEXT/PREV
knassre-bodo Jan 14, 2025
acba2c6
Started BEST documentation, still need to do bad next/prev/best examples
knassre-bodo Jan 14, 2025
3f0b703
[RUN CI]
knassre-bodo Jan 16, 2025
fb25054
Merge branch 'kian/function_docs' into kian/dsl_docs
knassre-bodo Jan 16, 2025
eb81179
Merge branch 'main' into kian/dsl_docs
knassre-bodo Jan 17, 2025
0f5df29
Added bad next/prev examples
knassre-bodo Jan 17, 2025
9752b92
Added bad next/prev examples
knassre-bodo Jan 17, 2025
223c1b5
Adding examples and fixing 911 bugs to AST/hybrid handling [RUN CI]
knassre-bodo Jan 21, 2025
7b142eb
Updated TOC
knassre-bodo Jan 21, 2025
947d405
Merge branch 'main' into kian/dsl_docs
knassre-bodo Jan 21, 2025
df7e1b0
Adding 911 bugfix for partition and corresponding tests [RUN CI]
knassre-bodo Jan 21, 2025
3f167a4
Fixing mkglot examples [RUN CI]
knassre-bodo Jan 21, 2025
09f2bfe
Added extra DSL example comments
knassre-bodo Jan 21, 2025
30536e8
Added extra DSL example comments
knassre-bodo Jan 21, 2025
261f869
Updating alias counter
knassre-bodo Jan 22, 2025
d5b315e
Merge branch 'kian/dsl_docs' into kian/fix_extra_joins
knassre-bodo Jan 22, 2025
3794707
Adding triple partition test
knassre-bodo Jan 22, 2025
c04f34d
Adjiusting triple_partition test [RUN CI]
knassre-bodo Jan 22, 2025
39ce2ac
Fixing unit test [RUN CI]
knassre-bodo Jan 22, 2025
28065c0
Update documentation/dsl.md
knassre-bodo Jan 22, 2025
d88192f
Update documentation/dsl.md
knassre-bodo Jan 22, 2025
fed91e5
Added extra WHERE examples
knassre-bodo Jan 23, 2025
7d9492d
Revisions
knassre-bodo Jan 23, 2025
2ef58e3
Update documentation/dsl.md
knassre-bodo Jan 23, 2025
b904f3a
Update documentation/dsl.md
knassre-bodo Jan 23, 2025
7f69536
Updating capitalization
knassre-bodo Jan 23, 2025
fb19842
Resolving conflicts
knassre-bodo Jan 23, 2025
d1ffdb5
Apply suggestions from code review
knassre-bodo Jan 23, 2025
91e2a74
Extra revisions
knassre-bodo Jan 23, 2025
8b9db1e
More plural fixes
knassre-bodo Jan 23, 2025
2c29a3d
Merge branch 'kian/dsl_docs' into kian/fix_extra_joins
knassre-bodo Jan 23, 2025
e8b0cdb
Refactor agg call handling
knassre-bodo Jan 23, 2025
a5d8c1f
WIP progress on correlated references
knassre-bodo Jan 24, 2025
836adca
Initially working implementaiton of relational handling, still need t…
knassre-bodo Jan 27, 2025
9c72887
Merge branch 'main' into kian/corrleated_backref
knassre-bodo Jan 27, 2025
49f46d8
Rolling back SQLGlot changes and ensuring correl names are only used …
knassre-bodo Jan 27, 2025
81d130e
Added SQLGlot support for correlated references; working only with ex…
knassre-bodo Jan 27, 2025
60fea67
Adding additional tests, have a plan for how to deal with the 4 major…
knassre-bodo Jan 27, 2025
9980a2c
Pulling down changes
knassre-bodo Jan 28, 2025
4c54d57
Pulling out SQLGlot changes into followup PR
knassre-bodo Jan 28, 2025
7a3fd30
Resolving conflicts
knassre-bodo Jan 28, 2025
b519cd0
Added two more correlated backref edge cases
knassre-bodo Jan 30, 2025
4033936
Fixing backref name conflict bug
knassre-bodo Jan 31, 2025
c2bbb59
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Jan 31, 2025
8fcc609
Confirmed all correl queries except 1, 2, 3, 6, 8 and 9 are working
knassre-bodo Jan 31, 2025
8a8e49f
Added multi-correlate example
knassre-bodo Jan 31, 2025
4eeadd9
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Jan 31, 2025
9427cce
Finished adding/refining complex correlation tests
knassre-bodo Feb 3, 2025
397f40d
Pulling up testing changes
knassre-bodo Feb 3, 2025
71ca0e2
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 3, 2025
719eec5
Adding two more correl tests
knassre-bodo Feb 3, 2025
2289aaf
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 3, 2025
52c2af5
Fixing correl 16
knassre-bodo Feb 3, 2025
19b3628
Fixing correlated test #16
knassre-bodo Feb 3, 2025
600616e
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 3, 2025
b6c9b85
WIP handling renamings
knassre-bodo Feb 3, 2025
afd241b
Converted qdag conversion tests to be plan file based
knassre-bodo Feb 3, 2025
ceff105
Converting pipeline files to new format
knassre-bodo Feb 3, 2025
64311bc
[RUN CI]
knassre-bodo Feb 3, 2025
26b21a2
Mass updating using PYDOUGH_UPDATE_TESTS
knassre-bodo Feb 3, 2025
8d6878f
Adding comments [RUN CI]
knassre-bodo Feb 3, 2025
06dd69f
Porting correl tests to planner files
knassre-bodo Feb 3, 2025
6949cdd
Revisions [RUN CI]
knassre-bodo Feb 4, 2025
7afe8ad
Resolving conflicts
knassre-bodo Feb 4, 2025
8184a7e
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 4, 2025
318eb1b
Resolving conflicts
knassre-bodo Feb 6, 2025
45fea5b
Resolving conflicts
knassre-bodo Feb 6, 2025
7bb353f
Resolving conflicts
knassre-bodo Feb 6, 2025
0cf4c11
Adding documentation
knassre-bodo Feb 6, 2025
3a1516e
Adding more comments
knassre-bodo Feb 6, 2025
746eed5
Added more comments [RUN CI]
knassre-bodo Feb 6, 2025
4818c59
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 6, 2025
17116d4
Initial handling of decorrelation setup
knassre-bodo Feb 6, 2025
9ab9bed
Adding decorrelater file
knassre-bodo Feb 6, 2025
a8f6535
Renaming and added comment
knassre-bodo Feb 6, 2025
cf8cc50
Merge branch 'main' into kian/corrleated_backref
knassre-bodo Feb 6, 2025
7556b5e
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 6, 2025
37ebdb4
Merge branch 'kian/correlated_backref_2' into kian/correlated_backref_3
knassre-bodo Feb 6, 2025
8ff5157
Implented singular decorrelation handling
knassre-bodo Feb 7, 2025
9191ec4
Fixing aggregation for singular case
knassre-bodo Feb 7, 2025
9bc3564
WIP handling edge cases and aggregation; need to fix tpch q5/q22, and…
knassre-bodo Feb 7, 2025
67a342e
Updating plans for newly working correl queries 6/9/17
knassre-bodo Feb 7, 2025
bb24b30
Bugfixes to decorrelation, compressing helper functions
knassre-bodo Feb 9, 2025
8f83903
Filling vlaue for correl tests 1/2/3
knassre-bodo Feb 9, 2025
b096443
Pulling up testing completion changes
knassre-bodo Feb 9, 2025
e9f2606
Updating refsols
knassre-bodo Feb 9, 2025
9772f61
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 9, 2025
ec93596
Resolving conflicts
knassre-bodo Feb 9, 2025
0a00cea
Resolving issues for correl #3
knassre-bodo Feb 10, 2025
d1e1520
Fixing correl #3 test output
knassre-bodo Feb 10, 2025
fd37df1
Moving correlated tests to their own file
knassre-bodo Feb 10, 2025
a75bba5
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 10, 2025
8d75668
Merge branch 'kian/correlated_backref_2' into kian/correlated_backref_3
knassre-bodo Feb 10, 2025
2cfe999
Added documentation
knassre-bodo Feb 10, 2025
322c722
Handling pruning edgecase
knassre-bodo Feb 10, 2025
1e44262
Updating plans
knassre-bodo Feb 10, 2025
4203c6c
Adding and fixing correl 18
knassre-bodo Feb 10, 2025
5b6e9b3
Adding test 18 file
knassre-bodo Feb 10, 2025
b480c3b
Adding correl test 19 and fixing hybrid tree bug
knassre-bodo Feb 10, 2025
c75b9fd
Adding correl test 20
knassre-bodo Feb 10, 2025
5a55ac5
Pulling up downstream test changes
knassre-bodo Feb 10, 2025
ad61d27
Pulling up downstream test changes
knassre-bodo Feb 10, 2025
07d771d
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 10, 2025
ba095ec
Resolving conflicts
knassre-bodo Feb 10, 2025
e82c9a1
Resolving conflicts
knassre-bodo Feb 10, 2025
639a56b
Adding clarity docs and removing ordering based nondeterminism
knassre-bodo Feb 10, 2025
01e6d97
Resolving bug in level skipping to get query 22 online
knassre-bodo Feb 10, 2025
eab09e9
Cleanup
knassre-bodo Feb 10, 2025
7ef0476
Cleanup
knassre-bodo Feb 10, 2025
1246ba5
Merge branch 'kian/correlated_backref_3' into kian/correlated_backref_4
knassre-bodo Feb 10, 2025
1bd9b07
[RUN CI]
knassre-bodo Feb 10, 2025
c83b67e
Removing dead code
knassre-bodo Feb 10, 2025
3fbcc1f
Merge branch 'kian/correlated_backref_3' into kian/correlated_backref_4
knassre-bodo Feb 10, 2025
fdbf909
Revisions [RUN CI]
knassre-bodo Feb 17, 2025
ec77c0d
Merge branch 'kian/correlation_omnibus' into kian/corrleated_backref
knassre-bodo Feb 17, 2025
a5531e3
Revisions
knassre-bodo Feb 17, 2025
4becac1
Merge branch 'kian/corrleated_backref' into kian/correlated_backref_2
knassre-bodo Feb 17, 2025
c4af826
Resolve conflicts
knassre-bodo Feb 17, 2025
f44ba0e
Merge branch 'kian/correlated_backref_2' into kian/correlated_backref_3
knassre-bodo Feb 17, 2025
11ac86d
Merge branch 'kian/correlation_omnibus' into kian/correlated_backref_3
knassre-bodo Feb 17, 2025
39da637
Merge branch 'kian/correlated_backref_3' into kian/correlated_backref_4
knassre-bodo Feb 17, 2025
8b74502
Added 3 more correlation tests, special PARTITION handling fixes, and…
knassre-bodo Feb 17, 2025
32074d9
[RUN CI]
knassre-bodo Feb 17, 2025
2105bc8
Fixing typos
knassre-bodo Feb 18, 2025
d6ba5a1
Resolving conflicts
knassre-bodo Feb 18, 2025
50bad40
Merge branch 'kian/correlation_omnibus' into kian/correlated_backref_4
knassre-bodo Feb 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 56 additions & 21 deletions pydough/conversion/hybrid_decorrelater.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class Decorrelater:

def make_decorrelate_parent(
self, hybrid: HybridTree, child_idx: int, required_steps: int
) -> HybridTree:
) -> tuple[HybridTree, int]:
"""
Creates a snapshot of the ancestry of the hybrid tree that contains
a correlated child, without any of its children, its descendants, or
Expand All @@ -50,21 +50,27 @@ def make_decorrelate_parent(
derivable.

Returns:
A snapshot of `hybrid` and its ancestry in the hybrid tree, without
without any of its children or pipeline operators that occur during
or after the derivation of the correlated child, or without any of
its descendants.
A tuple where the first entry is a snapshot of `hybrid` and its
ancestry in the hybrid tree, without without any of its children or
pipeline operators that occur during or after the derivation of the
correlated child, or without any of its descendants. The second
entry is the number of ancestor layers that should be skipped due
to the PARTITION edge case.
"""
if isinstance(hybrid.pipeline[0], HybridPartition) and child_idx == 0:
# Special case: if the correlated child is the data argument of a
# partition operation, then the parent to snapshot is actually the
# parent of the level containing the partition operation. In this
# case, all of the parent's children & pipeline operators should be
# included in the snapshot.
assert hybrid.parent is not None
return self.make_decorrelate_parent(
if hybrid.parent is None:
raise ValueError(
"Malformed hybrid tree: partition data input to a partition node cannot contain a correlated reference to the partition node."
)
result = self.make_decorrelate_parent(
hybrid.parent, len(hybrid.parent.children), len(hybrid.pipeline)
)
return result[0], result[1] + 1
Comment on lines +70 to +73
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to keep track of how many times we had to recursively step upward, because those levels should not be counted later.

# Temporarily detach the successor of the current level, then create a
# deep copy of the current level (which will include its ancestors),
# then reattach the successor back to the original. This ensures that
Expand All @@ -78,7 +84,7 @@ def make_decorrelate_parent(
# that is has to.
new_hybrid._children = new_hybrid._children[:child_idx]
new_hybrid._pipeline = new_hybrid._pipeline[: required_steps + 1]
return new_hybrid
return new_hybrid, 0

def remove_correl_refs(
self, expr: HybridExpr, parent: HybridTree, child_height: int
Expand Down Expand Up @@ -221,6 +227,7 @@ def decorrelate_child(
new_parent: HybridTree,
child: HybridConnection,
is_aggregate: bool,
skipped_levels: int,
) -> None:
"""
Runs the logic to de-correlate a child of a hybrid tree that contains
Expand All @@ -230,6 +237,17 @@ def decorrelate_child(
child can now replace correlated references with BACK references that
point to terms in its newly expanded ancestry, and the original hybrid
tree can now join onto this child using its uniqueness keys.

Args:
`old_parent`: The correlated ancestor hybrid tree that the correlated
references should point to when they are targeted for removal.
`new_parent`: The ancestor of `level` that removal should stop at.
`child`: The child of the hybrid tree that contains the correlated
nodes to be removed.
`is_aggregate`: Whether the child is being aggregated with regards
to its parent.
`skipped_levels`: The number of ancestor layers that should be
ignored when deriving backshifts of join/agg keys.
"""
# First, find the height of the child subtree & its top-most level.
child_root: HybridTree = child.subtree
Expand All @@ -245,28 +263,41 @@ def decorrelate_child(
new_join_keys: list[tuple[HybridExpr, HybridExpr]] = []
additional_levels: int = 0
current_level: HybridTree | None = old_parent
new_agg_keys: list[HybridExpr] = []
while current_level is not None:
for unique_key in current_level.pipeline[0].unique_exprs:
skip_join: bool = (
isinstance(current_level.pipeline[0], HybridPartition)
and child is current_level.children[0]
)
for unique_key in sorted(current_level.pipeline[0].unique_exprs, key=str):
lhs_key: HybridExpr | None = unique_key.shift_back(additional_levels)
rhs_key: HybridExpr | None = unique_key.shift_back(
additional_levels + child_height
additional_levels + child_height - skipped_levels
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "skipped_levels" comes into play in TPCH query 22, since the PARTITION should not be considered in the height since it is a level that gets skipped by the parent snapshotting function.

)
assert lhs_key is not None and rhs_key is not None
new_join_keys.append((lhs_key, rhs_key))
if not skip_join:
new_join_keys.append((lhs_key, rhs_key))
new_agg_keys.append(rhs_key)
current_level = current_level.parent
additional_levels += 1
child.subtree.join_keys = new_join_keys
# If aggregating, do the same with the aggregation keys.
# If aggregating, update the aggregation keys accordingly.
if is_aggregate:
new_agg_keys: list[HybridExpr] = []
assert child.subtree.join_keys is not None
for _, rhs_key in child.subtree.join_keys:
new_agg_keys.append(rhs_key)
child.subtree.agg_keys = new_agg_keys

def decorrelate_hybrid_tree(self, hybrid: HybridTree) -> HybridTree:
"""
TODO
The recursive procedure to remove unwanted correlated references from
the entire hybrid tree, called from the bottom and working upwards
to the top layer, and having each layer also de-correlate its children.

Args:
`hybrid`: The hybrid tree to remove correlated references from.

Returns:
The hybrid tree with all invalid correlated references removed as the
tree structure is re-written to allow them to be replaced with BACK
references. The transformation is also done in-place.
"""
# Recursively decorrelate the ancestors of the current level of the
# hybrid tree.
Expand All @@ -282,18 +313,22 @@ def decorrelate_hybrid_tree(self, hybrid: HybridTree) -> HybridTree:
for idx, child in enumerate(hybrid.children):
if idx not in hybrid.correlated_children:
continue
new_parent: HybridTree = self.make_decorrelate_parent(
hybrid, idx, hybrid.children[idx].required_steps
)
match child.connection_type:
case (
ConnectionType.SINGULAR
| ConnectionType.SINGULAR_ONLY_MATCH
| ConnectionType.AGGREGATION
| ConnectionType.AGGREGATION_ONLY_MATCH
):
new_parent, skipped_levels = self.make_decorrelate_parent(
hybrid, idx, hybrid.children[idx].required_steps
)
self.decorrelate_child(
hybrid, new_parent, child, child.connection_type.is_aggregation
hybrid,
new_parent,
child,
child.connection_type.is_aggregation,
skipped_levels,
)
case ConnectionType.NDISTINCT | ConnectionType.NDISTINCT_ONLY_MATCH:
raise NotImplementedError(
Expand Down
51 changes: 42 additions & 9 deletions pydough/conversion/hybrid_tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,17 +494,26 @@ def __init__(
for name, expr in predecessor.terms.items():
terms[name] = HybridRefExpr(name, expr.typ)
renamings.update(predecessor.renamings)
new_renamings: dict[str, str] = {}
for name, expr in new_expressions.items():
if name in terms and terms[name] == expr:
continue
expr = expr.apply_renamings(predecessor.renamings)
used_name: str = name
idx: int = 0
while used_name in terms or used_name in renamings:
while (
used_name in terms
or used_name in renamings
or used_name in new_renamings
):
used_name = f"{name}_{idx}"
idx += 1
terms[used_name] = expr
renamings[name] = used_name
new_renamings[name] = used_name
renamings.update(new_renamings)
for old_name, new_name in new_renamings.items():
expr = new_expressions.pop(old_name)
new_expressions[new_name] = expr
Comment on lines +504 to +516
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows us to ensure that the terms in new_expressions also get renamed when necessary.

super().__init__(terms, renamings, orderings, predecessor.unique_exprs)
self.calc = Calc
self.new_expressions = new_expressions
Expand All @@ -520,7 +529,10 @@ class HybridFilter(HybridOperation):

def __init__(self, predecessor: HybridOperation, condition: HybridExpr):
super().__init__(
predecessor.terms, {}, predecessor.orderings, predecessor.unique_exprs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of {} here caused a bug in name handling, caught by correl_19.

predecessor.terms,
predecessor.renamings,
predecessor.orderings,
predecessor.unique_exprs,
)
self.predecessor: HybridOperation = predecessor
self.condition: HybridExpr = condition
Expand Down Expand Up @@ -566,7 +578,10 @@ def __init__(
records_to_keep: int,
):
super().__init__(
predecessor.terms, {}, predecessor.orderings, predecessor.unique_exprs
predecessor.terms,
predecessor.renamings,
predecessor.orderings,
predecessor.unique_exprs,
)
self.predecessor: HybridOperation = predecessor
self.records_to_keep: int = records_to_keep
Expand Down Expand Up @@ -908,13 +923,13 @@ def __repr__(self):
lines.append(" -> ".join(repr(operation) for operation in self.pipeline))
prefix = " " if self.successor is None else "↓"
for idx, child in enumerate(self.children):
lines.append(f"{prefix} child #{idx}:")
lines.append(f"{prefix} child #{idx} ({child.connection_type.name}):")
if child.subtree.agg_keys is not None:
lines.append(
f"{prefix} aggregate: {child.subtree.agg_keys} -> {child.aggs}:"
)
lines.append(f"{prefix} aggregate: {child.subtree.agg_keys}")
if len(child.aggs):
lines.append(f"{prefix} aggs: {child.aggs}:")
if child.subtree.join_keys is not None:
lines.append(f"{prefix} join: {child.subtree.join_keys}:")
lines.append(f"{prefix} join: {child.subtree.join_keys}")
for line in repr(child.subtree).splitlines():
lines.append(f"{prefix} {line}")
return "\n".join(lines)
Expand Down Expand Up @@ -1964,6 +1979,24 @@ def make_hybrid_tree(
rhs_expr.name, 0, rhs_expr.typ
)
join_key_exprs.append((lhs_expr, rhs_expr))

case PartitionBy():
partition = HybridPartition()
successor_hybrid = HybridTree(partition)
self.populate_children(
successor_hybrid, node.child_access, child_ref_mapping
)
partition_child_idx = child_ref_mapping[0]
for key_name in node.calc_terms:
key = node.get_expr(key_name)
expr = self.make_hybrid_expr(
successor_hybrid, key, child_ref_mapping, False
)
partition.add_key(key_name, expr)
key_exprs.append(HybridRefExpr(key_name, expr.typ))
successor_hybrid.children[
partition_child_idx
].subtree.agg_keys = key_exprs
case _:
raise NotImplementedError(
f"{node.__class__.__name__} (child is {node.child_access.__class__.__name__})"
Expand Down
20 changes: 18 additions & 2 deletions pydough/conversion/relational_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -885,11 +885,26 @@ def rel_translation(
if isinstance(operation.collection, TableCollection):
result = self.build_simple_table_scan(operation)
if context is not None:
# If the collection access is the child of something
# else, join it onto that something else. Use the
# uniqueness keys of the ancestor, which should also be
# present in the collection (e.g. joining a partition
# onto the original data using the partition keys).
assert preceding_hybrid is not None
join_keys: list[tuple[HybridExpr, HybridExpr]] = []
for unique_column in sorted(
preceding_hybrid[0].pipeline[0].unique_exprs, key=str
Copy link
Contributor Author

@knassre-bodo knassre-bodo Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case arises from the correl_18 query.

):
if unique_column not in result.expressions:
raise ValueError(
f"Cannot connect parent context to child {operation.collection} because {unique_column} is not in the child's expressions."
)
join_keys.append((unique_column, unique_column))
result = self.join_outputs(
context,
result,
JoinType.INNER,
[],
join_keys,
None,
)
else:
Expand All @@ -913,7 +928,8 @@ def rel_translation(
assert context is not None, "Malformed HybridTree pattern."
result = self.translate_filter(operation, context)
case HybridPartition():
assert context is not None, "Malformed HybridTree pattern."
if context is None:
context = TranslationOutput(EmptySingleton(), {})
result = self.translate_partition(
operation, context, hybrid, pipeline_idx
)
Expand Down
21 changes: 19 additions & 2 deletions pydough/relational/relational_nodes/column_pruner.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@

from .abstract_node import RelationalNode
from .aggregate import Aggregate
from .join import Join
from .empty_singleton import EmptySingleton
from .join import Join, JoinType
from .project import Project
from .relational_expression_dispatcher import RelationalExpressionDispatcher
from .relational_root import RelationalRoot
Expand Down Expand Up @@ -141,7 +142,23 @@ def _prune_node_columns(

# Determine the new node.
output = new_node.copy(inputs=new_inputs)
return self._prune_identity_project(output), correl_refs
output = self._prune_identity_project(output)
# Special case: replace empty aggregation with VALUES () if possible.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These special cases can arise because the decorrelation can cause the LHS of the join to have 100% of its columns unused, which is problematic.

if (
isinstance(output, Aggregate)
and len(output.keys) == 0
and len(output.aggregations) == 0
):
return EmptySingleton(), correl_refs
# Special case: replace join where LHS is VALUES () with the RHS if
# possible.
if (
isinstance(output, Join)
and isinstance(output.inputs[0], EmptySingleton)
and output.join_types in ([JoinType.INNER], [JoinType.LEFT])
):
return output.inputs[1], correl_refs
return output, correl_refs

def prune_unused_columns(self, root: RelationalRoot) -> RelationalRoot:
"""
Expand Down
9 changes: 7 additions & 2 deletions pydough/sqlglot/sqlglot_relational_visitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,7 @@ def visit_filter(self, filter: Filter) -> None:
# TODO: (gh #151) Refactor a simpler way to check dependent expressions.
if (
"group" in input_expr.args
or "distinct" in input_expr.args
or "where" in input_expr.args
or "qualify" in input_expr.args
or "order" in input_expr.args
Expand Down Expand Up @@ -462,6 +463,7 @@ def visit_aggregate(self, aggregate: Aggregate) -> None:
query: Select
if (
"group" in input_expr.args
or "distinct" in input_expr.args
or "qualify" in input_expr.args
or "order" in input_expr.args
or "limit" in input_expr.args
Expand All @@ -472,7 +474,10 @@ def visit_aggregate(self, aggregate: Aggregate) -> None:
select_cols, input_expr, find_identifiers_in_list(select_cols)
)
if keys:
query = query.group_by(*keys)
if aggregations:
query = query.group_by(*keys)
else:
query = query.distinct()
Comment on lines +477 to +480
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so if we have an aggregate on keys A, B, C w/o any aggregation functions, we can just do SELECT DISTINCT A, B, C FROM (...)

self._stack.append(query)

def visit_limit(self, limit: Limit) -> None:
Expand Down Expand Up @@ -511,7 +516,7 @@ def visit_limit(self, limit: Limit) -> None:
self._stack.append(query)

def visit_empty_singleton(self, singleton: EmptySingleton) -> None:
self._stack.append(Select().from_(values([()])))
self._stack.append(Select().select(SQLGlotStar()).from_(values([()])))
Comment on lines -514 to +519
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, you can end up with SELECT FROM ... clauses (without anything between SELECT and FROM), which is invalid.


def visit_root(self, root: RelationalRoot) -> None:
self.visit_inputs(root)
Expand Down
7 changes: 6 additions & 1 deletion pydough/unqualified/qualification.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,12 @@ def qualify_partition(
partition: PartitionBy = self.builder.build_partition(
qualified_parent, qualified_child, child_name
)
return partition.with_keys(child_references)
partition = partition.with_keys(child_references)
# Special case: if accessing as a child, wrap in a
# ChildOperatorChildAccess term.
if isinstance(unqualified_parent, UnqualifiedRoot) and is_child:
return ChildOperatorChildAccess(partition)
return partition

def qualify_collection(
self,
Expand Down
Loading