-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix unequal DataFrame column heights from parquet hive scan with filter #21340
fix: Fix unequal DataFrame column heights from parquet hive scan with filter #21340
Conversation
afbf157
to
6b6fa5b
Compare
&mut df, | ||
schema.as_ref(), | ||
hive_partition_columns, | ||
md.num_rows(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cause was due to materializing md.num_rows()
height hive partition columns on a filtered DataFrame that may have less rows.
Fix this by removing the extra n_rows
arg. It was needed in the past due to empty DataFrames not having the height property. Now that we have the height property we simply need to ensure it is set to the correct height for empty DataFrames.
6b6fa5b
to
81e76c6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #21340 +/- ##
==========================================
- Coverage 79.93% 79.93% -0.01%
==========================================
Files 1596 1596
Lines 228623 228599 -24
Branches 2618 2618
==========================================
- Hits 182752 182732 -20
+ Misses 45272 45268 -4
Partials 599 599 ☔ View full report in Codecov by Sentry. |
3d979c2
to
c0a4d14
Compare
e5d9edc
to
9d4adc4
Compare
9d4adc4
to
4f496f9
Compare
# symlinking the binary. | ||
ln -sv \ | ||
$(python -c "import importlib; print(importlib.util.find_spec('polars').submodule_search_locations[0] + '/polars.abi3.so')") \ | ||
py-polars/polars/polars.abi3.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this for the new test case I added below
This is CI logs for the failure - https://github.com/pola-rs/polars/actions/runs/13417351173/job/37481333425
Still panicking for me.
|
@Bidek56 , can you open a separate issue with a reproducible example |
scan_parquet
andfilter
on single file with hive partition, in single threaded mode #21327