Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix incorrect predicate pushdown for predicates referring to right-join key columns #21293

Merged
merged 4 commits into from
Feb 17, 2025

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Feb 17, 2025

Non-suffixed columns were incorrectly identified as belonging to the RHS table due to using check_input_node(predicate.node(), &schema_right, expr_arena) (which incorrectly matches to the non-suffixed column on the RHS before join). We now use ExprOrigin to correctly identify this.

This PR also improves predicate pushdown for predicates referring only to RHS columns in right-joins:

Before

FILTER col("values_right").is_null() FROM
  RIGHT JOIN:
  LEFT PLAN ON: [col("key")]
    DF ["key", "values"]; PROJECT */2 COLUMNS
  RIGHT PLAN ON: [col("key")]
    DF ["key", "values"]; PROJECT */2 COLUMNS
  END RIGHT JOIN

After

RIGHT JOIN:
LEFT PLAN ON: [col("key")]
  DF ["key", "values"]; PROJECT */2 COLUMNS
RIGHT PLAN ON: [col("key")]
  FILTER col("values").is_null() FROM
    DF ["key", "values"]; PROJECT */2 COLUMNS
END RIGHT JOIN

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Feb 17, 2025
Copy link

codecov bot commented Feb 17, 2025

Codecov Report

Attention: Patch coverage is 94.06780% with 7 lines in your changes missing coverage. Please review.

Project coverage is 79.89%. Comparing base (5e51218) to head (4c3155b).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...lan/src/plans/optimizer/predicate_pushdown/join.rs 87.50% 4 Missing ⚠️
...ates/polars-plan/src/plans/optimizer/join_utils.rs 96.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21293      +/-   ##
==========================================
+ Coverage   79.82%   79.89%   +0.06%     
==========================================
  Files        1596     1596              
  Lines      228565   228574       +9     
  Branches     2608     2608              
==========================================
+ Hits       182462   182622     +160     
+ Misses      45507    45356     -151     
  Partials      596      596              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

use crate::plans::visitor::{AexprNode, RewriteRecursion, RewritingVisitor, TreeWalker};
use crate::plans::{ExprIR, MintermIter, OutputName};

fn remove_suffix<'a>(
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code moved to join_utils below for re-use

ExprOrigin::None,
|acc_origin, column_name| {
Ok(acc_origin
| Self::get_column_origin(&column_name, left_schema, right_schema, suffix)?)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to use BitOr

@nameexhaustion nameexhaustion marked this pull request as ready for review February 17, 2025 07:38
@nameexhaustion nameexhaustion changed the title fix: Fix incorrect predicate pushdown in some cases for right-joins fix: Fix incorrect predicate pushdown for predicates referring to right-join key columns Feb 17, 2025
@ritchie46 ritchie46 merged commit a438cc3 into pola-rs:main Feb 17, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filtering on duplicate columns after lazy right-join giving incorrect results
2 participants