Skip to content

Conversation

@kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Oct 10, 2025

This PR enables MV_EXPAND to be pushed down past Project operations, reducing unnecessary field extractions caused by KEEP * and improving query efficiency.

Closes #136292
Closes #136596

I believe this PR also closes #119074. Although I couldn’t reproduce the plan mentioned there on current main, it addresses the same topic, so it should be safe to close that issue.

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v9.3.0 labels Oct 10, 2025
@kanoshiou kanoshiou changed the title ESQL: Push down MvExpand past `Project ESQL: Push down MvExpand past Project Oct 10, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@wchaparro wchaparro requested a review from astefan November 6, 2025 14:50
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @kanoshiou !

I think this is very nice, but I think there's an approach that doesn't create defensive Evals to preserve the attribute. I'd like to explore that approach first before giving up on it.

}
}

Project project = PushDownUtils.pushDownPastProject(mvExpand);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be simpler to just put a brand new projection at the top based on mvExpand.output(), but plugging back in the aliases that the original projection created. That's how we do it in ReplaceAliasingEvalWithProject (although that code is a little complex to use as reference :/ )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to take my own advice back, sorry! Given that we already update the list of projections, I think going back to mvExpand.output() makes the code more confusing, and it's likely simpler to just use the updated projections list directly to create the new Projection that goes on top of the mvExpand.

@kanoshiou
Copy link
Contributor Author

Thanks for your review @alex-spies!

I believe aliases are not only generated by renames and an alias’s source value may still be used elsewhere.

For example:

row a = 1 | eval b = a | keep * | mv_expand b

In this case, b is an alias for a, and a is still required in the output.
If we push down mv_expand and use the original value of b (which is a) as the target, a would no longer be available, even though it’s still needed in the output.
Therefore, a defensive Eval is injected to preserve the original value.

kanoshiou and others added 8 commits November 9, 2025 19:21
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
- Enhance PushDownMvExpandPastProject rule to handle more complex query scenarios
- Add checks to prevent duplicate output attributes when pushing down MvExpand
- Support pushing down MvExpand for queries with aliased expressions
- Handle cases where target is an alias or referenced in an alias
- Add test case to verify complex MvExpand push-down behavior
Resolves edge cases in multi-stage query optimization where MvExpand interactions with Project and Eval nodes were previously limited.
- Refactored PushDownMvExpandPastProject to handle more complex projection scenarios
- Added support for preserving original projection aliases during MvExpand push-down
- Replaced previous push-down implementation with more robust alias resolution
- Ensured that output expressions maintain their original context and aliases
- Updated test case to reflect new push-down behavior
The changes improve the flexibility and accuracy of MvExpand push-down optimization in the logical plan optimizer.
- Simplified push-down logic for MvExpand past Project operations
- Removed unnecessary Limit operation from example comment
- Used `replaceChild()` method instead of creating a new MvExpand instance
- Streamlined code for pushing down MvExpand past Project nodes
- Improved code readability and reduced complexity of optimizer rule
- Modify PushDownMvExpandPastProject to handle more complex projection scenarios
- Update PushDownUtils to make resolveRenamesFromProject method public
- Enhance test cases in LogicalPlanOptimizerTests to validate new push-down behavior
- Simplify MvExpand and Project interaction by removing unnecessary Eval steps
- Improve optimization logic for multi-stage MvExpand operations
…astProject

- Refactor condition in semantic equality check for MvExpand push-down
- Modify logic to handle alias comparisons more precisely
- Simplify condition for checking input references in project operations
- Improve code readability and reduce complexity of push-down rule
Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to wrap my head around this problem, and I came to the same conclusion as @kanoshiou: MV_EXPAND drops the source attribute, so we can't just move the aliasing after that, or we'll lose the original value.

This said, the combination of this new rule with ReplaceAliasingEvalWithProject tends to be confusing, because one reverts the other. Right now it's not problematic, because ReplaceAliasingEvalWithProject is in the Substitutions phase, while this new rule is in the Operator Optimizations phase, so they never run in a loop. If they did, we'd have an infinite loop.

Comments like this make me think that the problematic bit here is ReplaceAliasingEvalWithProject, and aliases in Project in general
(btw, we have at least another rule that puts aliases into projections that is PushDownJoinPastProject), but this makes the topic too complex probably.

On a side note, ReplaceAliasingEvalWithProject looks like a nice optimization, but I have no evidence that it is effective in practice (do we have a benchmark for it?). I guess an EVAL a = b could be easily optimized at compute level, just using a single block for both.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou , you were right about the need for the defensive Evals.

I think this approach will work and have done another round of review. I think we can deal with the UnionAll problem, see below.

testPushDownMvExpandPastProject2
required_capability: push_down_mv_expand_past_project
from languages
| eval a = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we could add some tests where the expanded value is actually an MV, like eval = [1,2].

Comment on lines +33 to +35
// MvExpand[salary{r}#168,salary{r}#175]
// \_Project[[$$salary$converted_to$keyword{r$}#178 AS salary#168]]
// \_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, don't we just need to create a new expanded attribute?

We can create a new (synthetic) reference attribute $$temp$name$#200. Then we'd turn this into

Project[[$$temp$name$#200 AS salary#175]]
\_MvExpand[$$salary$converted_to$keyword{r$}#178,$$temp$name$#200]
  \_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]]

without defensive evals.

With a defensive eval, I think this would be

Project[[$$expanded$temp$name#200 AS salary#175]]
\_MvExpand[$$target$temp$name#199, $$expanded$temp$name#200]
  \_Eval[$$salary$converted_to$keyword{r$}#178 AS $$target$temp$name#199]
    \_UnionAll[[salary{r}#174, $$salary$converted_to$keyword{r$}#178]]

Comment on lines +46 to +48
// Check if the alias's original field (child) is referenced elsewhere in the projections.
// If the original field is not referenced by any other projection or alias,
// we don't need to inject an Eval to preserve it, and can safely resolve renames and push down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful comments, btw!

Comment on lines 37 to 39
if (projections.stream().anyMatch(e -> e instanceof Alias alias && inputNames.contains(alias.toAttribute().name()))) {
return mvExpand;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too broad. We only get a problem with duplicate output attributes if

  • the expanded field has the same name as a field in the projections input set, and
  • the projection shadows that specific field from the projection input set.

Comment on lines 65 to 66
projections.set(i, mvExpand.expanded());
pj = new Project(pj.source(), new Eval(aliasAlias.source(), pj.child(), List.of(aliasAlias)), projections);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. The projection at i is the expanded attribute, but this only exists after the mvExpand, so this Project here is inconsistent. This looks like a bug and I don't understand how the code in the next code block (lines 84 and below) works based on this inconsistent state.

It looks to me like the projections list was meant to become the final projections that we put on top of the mv expand. This is also a valid approach, and this code seems to nearly do it - the only thing left to do is to update the projections in the case where the target field is an un-aliased projection.

}
}

// Build a map of aliases from the original projection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The projection pj can be updated by the code above. I can't verify that using the aliases from pj is the right thing here.

}
}

Project project = PushDownUtils.pushDownPastProject(mvExpand);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to take my own advice back, sorry! Given that we already update the list of projections, I think going back to mvExpand.output() makes the code more confusing, and it's likely simpler to just use the updated projections list directly to create the new Projection that goes on top of the mvExpand.

// Create a new projection at the top based on mvExpand.output(), plugging back in the aliases
List<NamedExpression> newProjections = new ArrayList<>();
for (Attribute outputExpr : mvExpand.output()) {
Alias alias = aliases.resolve(outputExpr, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't normally use resolve for an AttributeMap<Alias>. It's really meant for AttributeMap<Attribute> because it can deal with multiple indirections. We don't do that here, so we should rather just use get.


// Create a new projection at the top based on mvExpand.output(), plugging back in the aliases
List<NamedExpression> newProjections = new ArrayList<>();
for (Attribute outputExpr : mvExpand.output()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, mvExpand has been mutated before reaching here. I can't say why this would be correct to use.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @kanoshiou , I found some things that don't make sense to me. Maybe I'm too tired and they will make more sense tomorrow, but I think we need to iterate one more time before we can get this in.

Let me know if you'd like to give it another shot - otherwise, I think I know how to touch up the optimizer rule and can happily get this over the finish line myself.

@kanoshiou
Copy link
Contributor Author

Thanks for the review, @alex-spies! And yes, I'd like to give it another shot!

…as collection and improving the pushdown skip condition.
…methods

- Extract MvExpand pushdown logic into reusable PushDownUtils methods
- Replace inline alias mapping with PushDownUtils.pushDownPastProject()
- Use PushDownUtils.resolveRenamesFromMap() to handle attribute resolution
- Reduce code duplication and improve maintainability of optimizer rules
- Simplify the projection reconstruction by delegating to utility functions
…ojections

- Replace AttributeMap import with Attribute for more direct attribute handling
- Inline MvExpand pushdown logic instead of delegating to utility methods
- Directly update projections to point to expanded attributes during pushdown
- Handle both Alias and direct NamedExpression cases when replacing target references
- Simplify the overall flow by constructing Project directly with updated projections
- Improve code clarity and maintainability by making the transformation explicit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

5 participants