-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Push down MvExpand past Project
#136398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: Push down MvExpand past Project
#136398
Conversation
MvExpand past `ProjectMvExpand past Project
...n/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownMvExpandPastProject.java
Outdated
Show resolved
Hide resolved
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @kanoshiou !
I think this is very nice, but I think there's an approach that doesn't create defensive Evals to preserve the attribute. I'd like to explore that approach first before giving up on it.
...n/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownMvExpandPastProject.java
Outdated
Show resolved
Hide resolved
...n/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownMvExpandPastProject.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| Project project = PushDownUtils.pushDownPastProject(mvExpand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be simpler to just put a brand new projection at the top based on mvExpand.output(), but plugging back in the aliases that the original projection created. That's how we do it in ReplaceAliasingEvalWithProject (although that code is a little complex to use as reference :/ )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to take my own advice back, sorry! Given that we already update the list of projections, I think going back to mvExpand.output() makes the code more confusing, and it's likely simpler to just use the updated projections list directly to create the new Projection that goes on top of the mvExpand.
|
Thanks for your review @alex-spies! I believe aliases are not only generated by renames and an alias’s source value may still be used elsewhere. For example: In this case, |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
- Enhance PushDownMvExpandPastProject rule to handle more complex query scenarios - Add checks to prevent duplicate output attributes when pushing down MvExpand - Support pushing down MvExpand for queries with aliased expressions - Handle cases where target is an alias or referenced in an alias - Add test case to verify complex MvExpand push-down behavior Resolves edge cases in multi-stage query optimization where MvExpand interactions with Project and Eval nodes were previously limited.
- Refactored PushDownMvExpandPastProject to handle more complex projection scenarios - Added support for preserving original projection aliases during MvExpand push-down - Replaced previous push-down implementation with more robust alias resolution - Ensured that output expressions maintain their original context and aliases - Updated test case to reflect new push-down behavior The changes improve the flexibility and accuracy of MvExpand push-down optimization in the logical plan optimizer.
- Simplified push-down logic for MvExpand past Project operations - Removed unnecessary Limit operation from example comment - Used `replaceChild()` method instead of creating a new MvExpand instance - Streamlined code for pushing down MvExpand past Project nodes - Improved code readability and reduced complexity of optimizer rule
- Modify PushDownMvExpandPastProject to handle more complex projection scenarios - Update PushDownUtils to make resolveRenamesFromProject method public - Enhance test cases in LogicalPlanOptimizerTests to validate new push-down behavior - Simplify MvExpand and Project interaction by removing unnecessary Eval steps - Improve optimization logic for multi-stage MvExpand operations
…astProject - Refactor condition in semantic equality check for MvExpand push-down - Modify logic to handle alias comparisons more precisely - Simplify condition for checking input references in project operations - Improve code readability and reduce complexity of push-down rule
luigidellaquila
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to wrap my head around this problem, and I came to the same conclusion as @kanoshiou: MV_EXPAND drops the source attribute, so we can't just move the aliasing after that, or we'll lose the original value.
This said, the combination of this new rule with ReplaceAliasingEvalWithProject tends to be confusing, because one reverts the other. Right now it's not problematic, because ReplaceAliasingEvalWithProject is in the Substitutions phase, while this new rule is in the Operator Optimizations phase, so they never run in a loop. If they did, we'd have an infinite loop.
Comments like this make me think that the problematic bit here is ReplaceAliasingEvalWithProject, and aliases in Project in general
(btw, we have at least another rule that puts aliases into projections that is PushDownJoinPastProject), but this makes the topic too complex probably.
On a side note, ReplaceAliasingEvalWithProject looks like a nice optimization, but I have no evidence that it is effective in practice (do we have a benchmark for it?). I guess an EVAL a = b could be easily optimized at compute level, just using a single block for both.
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kanoshiou , you were right about the need for the defensive Evals.
I think this approach will work and have done another round of review. I think we can deal with the UnionAll problem, see below.
| testPushDownMvExpandPastProject2 | ||
| required_capability: push_down_mv_expand_past_project | ||
| from languages | ||
| | eval a = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we could add some tests where the expanded value is actually an MV, like eval = [1,2].
...n/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownMvExpandPastProject.java
Outdated
Show resolved
Hide resolved
| // MvExpand[salary{r}#168,salary{r}#175] | ||
| // \_Project[[$$salary$converted_to$keyword{r$}#178 AS salary#168]] | ||
| // \_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, don't we just need to create a new expanded attribute?
We can create a new (synthetic) reference attribute $$temp$name$#200. Then we'd turn this into
Project[[$$temp$name$#200 AS salary#175]]
\_MvExpand[$$salary$converted_to$keyword{r$}#178,$$temp$name$#200]
\_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]]
without defensive evals.
With a defensive eval, I think this would be
Project[[$$expanded$temp$name#200 AS salary#175]]
\_MvExpand[$$target$temp$name#199, $$expanded$temp$name#200]
\_Eval[$$salary$converted_to$keyword{r$}#178 AS $$target$temp$name#199]
\_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]]
| // Check if the alias's original field (child) is referenced elsewhere in the projections. | ||
| // If the original field is not referenced by any other projection or alias, | ||
| // we don't need to inject an Eval to preserve it, and can safely resolve renames and push down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful comments, btw!
| if (projections.stream().anyMatch(e -> e instanceof Alias alias && inputNames.contains(alias.toAttribute().name()))) { | ||
| return mvExpand; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too broad. We only get a problem with duplicate output attributes if
- the expanded field has the same name as a field in the projections input set, and
- the projection shadows that specific field from the projection input set.
| projections.set(i, mvExpand.expanded()); | ||
| pj = new Project(pj.source(), new Eval(aliasAlias.source(), pj.child(), List.of(aliasAlias)), projections); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. The projection at i is the expanded attribute, but this only exists after the mvExpand, so this Project here is inconsistent. This looks like a bug and I don't understand how the code in the next code block (lines 84 and below) works based on this inconsistent state.
It looks to me like the projections list was meant to become the final projections that we put on top of the mv expand. This is also a valid approach, and this code seems to nearly do it - the only thing left to do is to update the projections in the case where the target field is an un-aliased projection.
| } | ||
| } | ||
|
|
||
| // Build a map of aliases from the original projection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The projection pj can be updated by the code above. I can't verify that using the aliases from pj is the right thing here.
| } | ||
| } | ||
|
|
||
| Project project = PushDownUtils.pushDownPastProject(mvExpand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to take my own advice back, sorry! Given that we already update the list of projections, I think going back to mvExpand.output() makes the code more confusing, and it's likely simpler to just use the updated projections list directly to create the new Projection that goes on top of the mvExpand.
| // Create a new projection at the top based on mvExpand.output(), plugging back in the aliases | ||
| List<NamedExpression> newProjections = new ArrayList<>(); | ||
| for (Attribute outputExpr : mvExpand.output()) { | ||
| Alias alias = aliases.resolve(outputExpr, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't normally use resolve for an AttributeMap<Alias>. It's really meant for AttributeMap<Attribute> because it can deal with multiple indirections. We don't do that here, so we should rather just use get.
|
|
||
| // Create a new projection at the top based on mvExpand.output(), plugging back in the aliases | ||
| List<NamedExpression> newProjections = new ArrayList<>(); | ||
| for (Attribute outputExpr : mvExpand.output()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, mvExpand has been mutated before reaching here. I can't say why this would be correct to use.
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @kanoshiou , I found some things that don't make sense to me. Maybe I'm too tired and they will make more sense tomorrow, but I think we need to iterate one more time before we can get this in.
Let me know if you'd like to give it another shot - otherwise, I think I know how to touch up the optimizer rule and can happily get this over the finish line myself.
|
Thanks for the review, @alex-spies! And yes, I'd like to give it another shot! |
…as collection and improving the pushdown skip condition.
…methods - Extract MvExpand pushdown logic into reusable PushDownUtils methods - Replace inline alias mapping with PushDownUtils.pushDownPastProject() - Use PushDownUtils.resolveRenamesFromMap() to handle attribute resolution - Reduce code duplication and improve maintainability of optimizer rules - Simplify the projection reconstruction by delegating to utility functions
…ojections - Replace AttributeMap import with Attribute for more direct attribute handling - Inline MvExpand pushdown logic instead of delegating to utility methods - Directly update projections to point to expanded attributes during pushdown - Handle both Alias and direct NamedExpression cases when replacing target references - Simplify the overall flow by constructing Project directly with updated projections - Improve code clarity and maintainability by making the transformation explicit
This PR enables
MV_EXPANDto be pushed down pastProjectoperations, reducing unnecessary field extractions caused byKEEP *and improving query efficiency.Closes #136292
Closes #136596
I believe this PR also closes #119074. Although I couldn’t reproduce the plan mentioned there on current main, it addresses the same topic, so it should be safe to close that issue.