Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ENH: Basis for a StringDtype using Arrow #35259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Basis for a StringDtype using Arrow #35259
Changes from 5 commits
4c2e37a
d477ee7
206f493
d58dba6
7a9e2c3
ffc4c0f
c1305ab
13a42f7
decd022
3145e44
e22b348
4b8108c
2446562
a0dcc85
5c42173
28c3ef2
4044d4c
1740524
e9bb36f
8ad120b
34bf57d
f92241e
c09382d
bac64c1
0956147
963e1cf
87b8e67
1ed0585
fa954f7
82b84bf
b1a3032
08d34f4
ae49807
2e5d4c7
c8318cc
1a200a2
e10be80
c1d3087
34f563d
f5fc4fd
a5a7c85
f651563
f5419b9
3af5ce0
bdf4ad2
e044c7f
c5625a8
50889fb
0e1773b
7bb9574
fc45ef7
51d7d0a
bd76a75
3cf5c91
07239a0
9a7cfc5
2ba0dcd
97c56e2
d6d3543
ab40dce
d71a895
f342b62
3d05c89
b3c6347
26bca25
9579444
88094a7
ba0cee8
6709ac3
11388b4
eb284e7
27ce19a
9b70709
6757feb
460ea38
7bee5e2
91f3763
36b662a
7a9ef9c
5db8788
c76c39f
87b7863
24a782d
353bff9
be93947
11eb08f
52440a7
bd05c2c
27c8de5
b6713e9
125cb6f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type input args as much as possile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this raise an error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None needs to be replaced here with pd.NA, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting with pyarrow 1.0, there is a
pyarrow.compute.fill_null
that does this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with
pc.fill_null
is that it only supports scalars butpandas
also allows arrays as an input tofillna
as well as one can limit the number of values to replace. This is both not supported byfill_null
and we thus need to fallback in these cases to object-based methods.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have copied the fletcher implementation as a starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChunkedArray
has annbytes
property nowawadays, so I think this can bereturn self.data.nbytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a pyarrow array, right? Probably want to convert it into a pandas BooleanArray (to use the nullable boolean dtype).
BooleanDtype.__from_arrow__
implements a conversion (although I think that needs to be optimized; separate issue though)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this cannot be null, I will return a numpy array here. This is also what the current masked pandas arrays do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comment on what's preferable, but the interface does allow for non-ndarrays here. SparseArray.isna() returns a Sparse[bool] I think.