-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predicate pruning for not like expressions #14567
base: main
Are you sure you want to change the base?
Conversation
FYI @adriangb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this should also work with no wildcards col not like 'foo'
? Or do we optimize like expressions without wildcards into an equality check? If we don't already do that optimization we probably should 😄
#[tokio::test] | ||
async fn test_utf8_not_like_ecsape() { | ||
Utf8Test::new(|value| col("a").not_like(lit(format!("\\%{}%", value)))) | ||
.run() | ||
.await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love these fuzz tests! These 3 lines of code give me great confidence that this PR does the right thing 😄
Operator::NotLikeMatch => { | ||
build_not_like_match(expr_builder).ok_or_else(|| { | ||
plan_datafusion_err!( | ||
"The NOT LIKE expression with wildcards is only supported at the end of the pattern" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general note, not necessarily for this PR: I do think we should somehow make these errors (also for the existing LIKE case) a bit better if there is any other reason they might be rejected? Or if this is the only reason, a more generic error might be better? I realize these errors likely never surface to users but I think it'd be unfortunate to see an error like this one when the actual reason it wasn't supported has nothing to do with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Now it return Result<Arc<dyn PhysicalExpr>>
// Example: For pattern "foo%bar", the row group might include values like | ||
// ["foobar", "food", "foodbar"], making it unsafe to prune. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand upon the examples here? My naive gut instinct (which I think is wrong, I tried before 😓) is that it should be able to truncate foo%bar
to foo%
. Maybe an example that can show what would go wrong if you tried that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to explain why it goes wrong. Please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense thanks. @findepi do you know if Trino handles this and if so does it do it in similar way?
Yes. add some test to demonstrate it
|
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
For predicate
col NOT LIKE 'foo%'
, we rewrite it as(col_min NOT LIKE 'foo%' OR col_max NOT LIKE 'foo%')
. If bothcol_min
andcol_max
have the prefixfoo
, we skip the entire row group (as we can be certain that all data in this row group has the prefixfoo
).Are these changes tested?
Yes
Are there any user-facing changes?
No