This repository was archived by the owner on Jan 7, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 29
feat: "hello world" selectivity computation #70
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gun9niR
reviewed
Feb 16, 2024
Gun9niR
reviewed
Feb 17, 2024
wangpatrick57
referenced
this pull request
Feb 20, 2024
"Core" means I am handling all cases where we don't fall back to hardcoded defaults (e.g. missing statistics, "col = col", "col = subquery", etc.). Note the difference between this PR and [hello world selectivity](cmu-db/optd#70). In this PR, the only file that changed is `base_cost.rs`, because all the "infrastructure" was already set up in "hello world selectivity" I have written 20 unit tests to test this core logic in a variety of scenarios (nulls vs no nulls, value in MCVs vs not in MCVs, reversing the order of children, etc.) I chose to write code-based unit tests instead of SQLPlanner-like tests because it allows fine-grained control of the expression tree down to the order of children (and because it takes a lot more work to modify SQLPlanner to output cardinality). I made an effort to make the unit tests less brittle by defining helper functions such as `bin_op()` or `const_i32()` for constructing expression trees. If our expression tree representation changes in the future, it is likely that only these helper functions need to be changed to make the unit tests work again. Some TPC-H queries require "non-core" logic, so this code doesn't currently run with all of TPC-H. To avoid crashing, I simply return INVALID_SELECTIVITY (0.001) for any branches that aren't a part of the "core". I will handle "non-core" logic in a future PR. Another case not being handled is comparisons (</<=/>=/>) between `Value` objects. Handling this requires further discussion because not all Values have a meaningful "order". In the meantime, I hardcoded converting all `Value` objects to `i32` to perform comparisons.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description: reason this PR just contains the simplest possible selectivity computation is because computing selectivity at all requires lots of refactoring, so I want that refactoring reviewed separately from the "core" selectivity computation logic.
Demo: passing

test_colref_eq_constint_in_mcv
test