-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditionally disable evaluation for grouping functions #2706
base: integration
Are you sure you want to change the base?
Conditionally disable evaluation for grouping functions #2706
Conversation
2cb9dae
to
66b1fde
Compare
warehouse/query-core/src/main/java/datawave/query/planner/DefaultQueryPlanner.java
Outdated
Show resolved
Hide resolved
78c2742
to
b05f7e4
Compare
* <p> | ||
* Evaluation cannot be disabled if any content, query, or filter functions exist, or if delayed or evaluation only markers are present. | ||
*/ | ||
public class DisableEvaluationForGroupingVisitor extends ShortCircuitBaseVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to work through the cases here. Fundamentally I think the goal is we can disable evaluation if the query is either fully satisfied against the global index, or is fully satisfied by the field index? I think there might be an opportunity to use existing code here [and maybe cleanup the existing code] so that as query capabilities change this doesn't get left behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards not using the executable determination visitor for a few reasons.
- First, it looks like even if the "isForFieldIndex" flag is set it will say that marker nodes for range and regex terms are not executable on the field index and that's simply not correct.
- Second, the visit(EQ) method has some conflicting logic for NO_FIELD and ANYFIELD. It will set the state to ignorable if the field is "NO_FIELD", but the next check is for 'nofield' or 'anyfield' and sets the state to executable.
- The executable determination visitor doesn't check if a negation is the root of the query tree. While technically executable the nested iterator logic won't support it.
That said, there are a few more edge cases I need to work through for the new visitor, specifically:
- handling the upper and lower bounds of a bounded range. I don't want to assume that other visitors did their job correctly, so I would still like to check the indexed state (or lack thereof) of the fields.
- correctly handling negations within intersections and unions, and combinations thereof
- clarify when this visitor can be used because it's essentially general purpose but the name implies otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see the Visitor used to determine disabling evaluation revisited. Then this can generally be used even outside of grouping functions
…stances Feature flag disabling evaluation for queries with groupby functions
8c87fa5
to
1f71cd4
Compare
Does not the existing SatisfactionVisitor, the QueryIterator.createIteratorBuildingVisitor(... isQueryFullySatistifed ...) and the fieldIndexSatisfiesQuery member of the QueryIterator already achieve this goal ? I think we are reinventing something that already exists. If that mechanism is not working then we need to fix that instead please. |
Also why is this specific to grouping functions (per the title)? |
In short, because the SatisfactionVisitor is wrong. Also the SatisfactionVisitor will not be executed if the HIT_LIST option is specified. But to your point, the SatisfactionVisitor does not actually do anything to disable evaluation -- it merely determines the order of evaluation and event aggregation. So this visitor that determines if we can disable evaluation answers a completely separate and distinct question vs. the SatisfactionVisitor. If the field index satisfies the query then evaluation happens before event aggregation. If the field index does not satisfy the query then event aggregation must happen before evaluation. See #2713 for examples where the SatisfactionVisitor gets the wrong answer. I'm working on fixing that. As to why this is specific to grouping functions, the grouping function is only concerned with counting fields. There are no hit terms involved, thus under specific circumstances we can just disable evaluation altogether. |
Ok, I see the difference now. So disabling evaluation can be done if the field index lookups will precisely reduce the candidates to only those that would evaluate to true, whereas the satisfaction visitor mechanism presumes that the values pulled from the field index may not evaluate to true? Is that actually possible? I am wondering whether the whole satisfaction visitor and field index satisfaction mechanism should avoid evaluation altogether and hence support this use case. |
I must say I am hesitant to avoid evaluation in any circumstance as this provides some fault tolerance to ensure that events match the original query. |
After many more discussions, I would much rather fix the satisfaction iterator to handle more cases than allow the addition of a path that can bypass evaluation. Once that path is built, then it will be used for other things which is more than I am willing to allow. Please focus on fixing the satisfaction iterator to allow more cases to be evaluated prior to aggregation which I expect is the main reason for the performance gain that you had witnessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need different approach
Conditionally disable evaluation for queries with grouping functions