Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change confidenceLevel from boolean to Enum #22957

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

abhinavmuk04
Copy link
Contributor

@abhinavmuk04 abhinavmuk04 commented Jun 7, 2024

Description

Change confidenceLevel from boolean to Enum

Motivation and Context

This will help us with future changes where we want to consider choosing the 'FACT' CBO over regular HBO

Test Plan

Reran existing tests where previosuly 'isConfident' was used with new 'confidenceLevel' functions

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@abhinavmuk04 abhinavmuk04 requested a review from presto-oss June 7, 2024 18:36
@abhinavmuk04 abhinavmuk04 marked this pull request as draft June 7, 2024 18:38
@jaystarshot
Copy link
Member

Why not keep it a fraction and add a method to get high low etc?
This can be used for eg-
Table scan - confidence 1 ,
filter on top decreases confidence by 0.05
so filter-tablescan can have confidence 0.95

@rschlussel
Copy link
Contributor

rschlussel commented Jun 7, 2024

Why not keep it a fraction and add a method to get high low etc? This can be used for eg- Table scan - confidence 1 , filter on top decreases confidence by 0.05 so filter-tablescan can have confidence 0.95

So that approach is nice because it allows for appreciating that compounding errors makes things worse, but my concern is that it's way more precision than we actually know and more than we really have a use for. For example, even though a filter on top of a join on top of a filter has a problem of compounding error, a filter on top of a tablescan is also basically a guess if there are no histograms, or even with histograms if your filter is a more complex function. Similarly if you have filters on multiple columns and you don't know the extent to which they are correlated, it's all a guess. I'm not sure it's worth teasing out the relative confidence levels of all these things because the benefit isn't clear to me.

And for similar reasons it would also be hard to keep all the relative confidences consistent (like is x actually less confident than y or did the particular paths the confidence levels came through derive their computation in different ways, were implemented at different times,etc., but really we'd think y is less confident than x or they're about the same)

I think until there's a clear use case for a continuous range of confidence levels, a few discrete options is simpler and does what we need.

@abhinavmuk04 abhinavmuk04 force-pushed the milestone1p1 branch 7 times, most recently from c49ca8d to d613b10 Compare June 10, 2024 03:41
@abhinavmuk04 abhinavmuk04 marked this pull request as ready for review June 10, 2024 03:47
@abhinavmuk04 abhinavmuk04 force-pushed the milestone1p1 branch 2 times, most recently from 0107bba to b232d34 Compare June 10, 2024 14:54
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are many other stats rules that need to be updated. check for all classes that extend simple stats rule or all usages of PlanNodeStatsEstimate.Builder

@abhinavmuk04 abhinavmuk04 force-pushed the milestone1p1 branch 3 times, most recently from 1901e5c to a1a3859 Compare June 10, 2024 18:25
@abhinavmuk04 abhinavmuk04 marked this pull request as draft June 10, 2024 18:25
@abhinavmuk04 abhinavmuk04 marked this pull request as ready for review June 10, 2024 18:25
@abhinavmuk04 abhinavmuk04 force-pushed the milestone1p1 branch 2 times, most recently from a27cf02 to f3a6550 Compare June 10, 2024 21:38
rschlussel
rschlussel previously approved these changes Jun 11, 2024
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, but looks good. Thanks!

@kaikalur kaikalur merged commit f876f17 into prestodb:master Jun 11, 2024
56 checks passed
@abhinavmuk04 abhinavmuk04 linked an issue Jun 26, 2024 that may be closed by this pull request
@abhinavmuk04 abhinavmuk04 deleted the milestone1p1 branch June 28, 2024 22:34
@abhinavmuk04 abhinavmuk04 restored the milestone1p1 branch August 1, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assign different confidence level for plan estimates
4 participants