-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix(rust): Preserve List inner dtype during chunked take operations #25634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25634 +/- ##
==========================================
+ Coverage 79.35% 79.44% +0.09%
==========================================
Files 1743 1743
Lines 240295 240328 +33
Branches 3038 3038
==========================================
+ Hits 190683 190928 +245
+ Misses 48830 48618 -212
Partials 782 782 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| let ca = self.list().unwrap(); | ||
| ca.take_chunked_unchecked(by, sorted, avoid_sharing) | ||
| .into_series() | ||
| let taken = ca.take_chunked_unchecked(by, sorted, avoid_sharing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to fix it - could we instead update take_chunked_unchecked() impl for ChunkedArray to ensure the original type is restored there? -
| unsafe fn take_chunked_unchecked<const B: u64>( |
c725f27 to
dc53b56
Compare
When performing left/right joins on chunked DataFrames (common with native Parquet reader), the `take_chunked_unchecked` and `take_opt_chunked_unchecked` methods would lose dtype information for nested types like `List(Categorical)`. The issue was that `ChunkedArray::with_chunk` re-infers the dtype from the physical Arrow array, causing `List(Categorical)` to become `List(UInt32)`. The fix uses `ChunkedArray::with_chunk_like` instead, which preserves the original ChunkedArray's dtype when constructing the result. Fixes pola-rs#25626
dc53b56 to
233d6cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
When performing left/right joins on chunked DataFrames the
take_chunked_uncheckedandtake_opt_chunked_uncheckedmethods forListtypes would lose the inner dtype information. This causedList(Categorical)to becomeList(UInt32)becauseChunkedArray::with_chunkre-infers the dtype from the physical Arrow array.The fix preserves the original dtype by using
Series::from_chunks_and_dtype_uncheckedwith the originalself.dtype()instead of letting it be re-inferred.Fixes #25626