-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change confidenceLevel from boolean to Enum #22957
Conversation
Why not keep it a fraction and add a method to get high low etc? |
6ca6e0e
to
1bcf746
Compare
presto-main/src/main/java/com/facebook/presto/cost/ExchangeStatsRule.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/LimitStatsRule.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/PlanNodeStatsEstimate.java
Outdated
Show resolved
Hide resolved
...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/TableScanStatsRule.java
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/TestPushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
So that approach is nice because it allows for appreciating that compounding errors makes things worse, but my concern is that it's way more precision than we actually know and more than we really have a use for. For example, even though a filter on top of a join on top of a filter has a problem of compounding error, a filter on top of a tablescan is also basically a guess if there are no histograms, or even with histograms if your filter is a more complex function. Similarly if you have filters on multiple columns and you don't know the extent to which they are correlated, it's all a guess. I'm not sure it's worth teasing out the relative confidence levels of all these things because the benefit isn't clear to me. And for similar reasons it would also be hard to keep all the relative confidences consistent (like is x actually less confident than y or did the particular paths the confidence levels came through derive their computation in different ways, were implemented at different times,etc., but really we'd think y is less confident than x or they're about the same) I think until there's a clear use case for a continuous range of confidence levels, a few discrete options is simpler and does what we need. |
c49ca8d
to
d613b10
Compare
0107bba
to
b232d34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are many other stats rules that need to be updated. check for all classes that extend simple stats rule or all usages of PlanNodeStatsEstimate.Builder
presto-main/src/test/java/com/facebook/presto/cost/TestExchangeStatsRule.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/cost/TestExchangeStatsRule.java
Outdated
Show resolved
Hide resolved
1901e5c
to
a1a3859
Compare
a27cf02
to
f3a6550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, but looks good. Thanks!
presto-main/src/main/java/com/facebook/presto/cost/AggregationStatsRule.java
Outdated
Show resolved
Hide resolved
f3a6550
to
2d22250
Compare
Description
Change confidenceLevel from boolean to Enum
Motivation and Context
This will help us with future changes where we want to consider choosing the 'FACT' CBO over regular HBO
Test Plan
Reran existing tests where previosuly 'isConfident' was used with new 'confidenceLevel' functions
Contributor checklist
Release Notes
== NO RELEASE NOTE ==