Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally disable evaluation for grouping functions #2706

Open
wants to merge 3 commits into
base: integration
Choose a base branch
from

Conversation

apmoriarty
Copy link
Collaborator

Conditionally disable evaluation for queries with grouping functions

@apmoriarty apmoriarty force-pushed the feature/improve-grouping-disable-evaluation branch from 2cb9dae to 66b1fde Compare January 22, 2025 16:34
@apmoriarty apmoriarty force-pushed the feature/improve-grouping-disable-evaluation branch from 78c2742 to b05f7e4 Compare January 27, 2025 10:07
@apmoriarty apmoriarty requested a review from awrigh11 January 27, 2025 12:50
awrigh11
awrigh11 previously approved these changes Jan 27, 2025
* <p>
* Evaluation cannot be disabled if any content, query, or filter functions exist, or if delayed or evaluation only markers are present.
*/
public class DisableEvaluationForGroupingVisitor extends ShortCircuitBaseVisitor {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to work through the cases here. Fundamentally I think the goal is we can disable evaluation if the query is either fully satisfied against the global index, or is fully satisfied by the field index? I think there might be an opportunity to use existing code here [and maybe cleanup the existing code] so that as query capabilities change this doesn't get left behind.

Copy link
Collaborator Author

@apmoriarty apmoriarty Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards not using the executable determination visitor for a few reasons.

  • First, it looks like even if the "isForFieldIndex" flag is set it will say that marker nodes for range and regex terms are not executable on the field index and that's simply not correct.
  • Second, the visit(EQ) method has some conflicting logic for NO_FIELD and ANYFIELD. It will set the state to ignorable if the field is "NO_FIELD", but the next check is for 'nofield' or 'anyfield' and sets the state to executable.
  • The executable determination visitor doesn't check if a negation is the root of the query tree. While technically executable the nested iterator logic won't support it.

That said, there are a few more edge cases I need to work through for the new visitor, specifically:

  • handling the upper and lower bounds of a bounded range. I don't want to assume that other visitors did their job correctly, so I would still like to check the indexed state (or lack thereof) of the fields.
  • correctly handling negations within intersections and unions, and combinations thereof
  • clarify when this visitor can be used because it's essentially general purpose but the name implies otherwise

Copy link
Collaborator

@FineAndDandy FineAndDandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the Visitor used to determine disabling evaluation revisited. Then this can generally be used even outside of grouping functions

@apmoriarty apmoriarty force-pushed the feature/improve-grouping-disable-evaluation branch from 8c87fa5 to 1f71cd4 Compare January 28, 2025 13:42
@ivakegg
Copy link
Collaborator

ivakegg commented Jan 29, 2025

Does not the existing SatisfactionVisitor, the QueryIterator.createIteratorBuildingVisitor(... isQueryFullySatistifed ...) and the fieldIndexSatisfiesQuery member of the QueryIterator already achieve this goal ? I think we are reinventing something that already exists. If that mechanism is not working then we need to fix that instead please.

@ivakegg
Copy link
Collaborator

ivakegg commented Jan 29, 2025

Also why is this specific to grouping functions (per the title)?

@apmoriarty
Copy link
Collaborator Author

Does not the existing SatisfactionVisitor, the QueryIterator.createIteratorBuildingVisitor(... isQueryFullySatistifed ...) and the fieldIndexSatisfiesQuery member of the QueryIterator already achieve this goal ? I think we are reinventing something that already exists. If that mechanism is not working then we need to fix that instead please.

In short, because the SatisfactionVisitor is wrong. Also the SatisfactionVisitor will not be executed if the HIT_LIST option is specified. But to your point, the SatisfactionVisitor does not actually do anything to disable evaluation -- it merely determines the order of evaluation and event aggregation. So this visitor that determines if we can disable evaluation answers a completely separate and distinct question vs. the SatisfactionVisitor.

If the field index satisfies the query then evaluation happens before event aggregation. If the field index does not satisfy the query then event aggregation must happen before evaluation.

See #2713 for examples where the SatisfactionVisitor gets the wrong answer. I'm working on fixing that.

As to why this is specific to grouping functions, the grouping function is only concerned with counting fields. There are no hit terms involved, thus under specific circumstances we can just disable evaluation altogether.

@ivakegg
Copy link
Collaborator

ivakegg commented Jan 29, 2025

Ok, I see the difference now. So disabling evaluation can be done if the field index lookups will precisely reduce the candidates to only those that would evaluate to true, whereas the satisfaction visitor mechanism presumes that the values pulled from the field index may not evaluate to true? Is that actually possible? I am wondering whether the whole satisfaction visitor and field index satisfaction mechanism should avoid evaluation altogether and hence support this use case.

@ivakegg
Copy link
Collaborator

ivakegg commented Jan 29, 2025

I must say I am hesitant to avoid evaluation in any circumstance as this provides some fault tolerance to ensure that events match the original query.

@ivakegg
Copy link
Collaborator

ivakegg commented Jan 30, 2025

After many more discussions, I would much rather fix the satisfaction iterator to handle more cases than allow the addition of a path that can bypass evaluation. Once that path is built, then it will be used for other things which is more than I am willing to allow. Please focus on fixing the satisfaction iterator to allow more cases to be evaluated prior to aggregation which I expect is the main reason for the performance gain that you had witnessed.

Copy link
Collaborator

@ivakegg ivakegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need different approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants