-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[ArrayManager] BUG: fix setitem with non-aligned boolean dataframe #39539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7eee352
c4da68c
2476591
f4008e4
024d5b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8801,6 +8801,13 @@ def _where( | |
if axis is not None: | ||
axis = self._get_axis_number(axis) | ||
|
||
cond_orig = cond | ||
|
||
# Needed for DataFrames with ArrayManager, see below for details | ||
all_bool_columns = False | ||
if isinstance(cond, ABCDataFrame) and cond._has_array_manager: | ||
all_bool_columns = all(is_bool_dtype(dt) for dt in cond.dtypes) | ||
|
||
# align the cond to same shape as myself | ||
cond = com.apply_if_callable(cond, self) | ||
if isinstance(cond, NDFrame): | ||
|
@@ -8812,9 +8819,32 @@ def _where( | |
raise ValueError("Array conditional must be same shape as self") | ||
cond = self._constructor(cond, **self._construct_axes_dict()) | ||
|
||
# Needed for DataFrames with ArrayManager, see below for details | ||
if ( | ||
isinstance(cond, ABCDataFrame) | ||
and cond._has_array_manager | ||
and isinstance(cond_orig, ABCSeries) | ||
): | ||
all_bool_columns = is_bool_dtype(cond_orig.dtype) | ||
|
||
# make sure we are boolean | ||
fill_value = bool(inplace) | ||
cond = cond.fillna(fill_value) | ||
try: | ||
cond = cond.fillna(fill_value) | ||
except TypeError: | ||
# With ArrayManager, fillna can raise an error if `cond` is not | ||
# of boolean dtype | ||
raise ValueError("Boolean array expected for the condition") | ||
|
||
# With ArrayManager, `fillna` does not automatically change object dtype | ||
# back to bools (if the alignment made it object by introducing NaNs). | ||
# So in this case we cast back to bool manually *if* the original columns | ||
# before aligning were bool | ||
# TODO this workaround can be removed once we have nullable boolean dtype | ||
# as default | ||
if isinstance(cond, ABCDataFrame) and cond._has_array_manager: | ||
if not all(is_bool_dtype(dt) for dt in cond.dtypes) and all_bool_columns: | ||
cond = cond.astype(bool) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a viable way to put these patches in with the ArrayManager code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't directly see one, as we don't want to change (IMO) the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. left same comment above. I think we are going to want to push more code to the manager (kind of the opposite that we have been doing of late though). |
||
msg = "Boolean array expected for the condition, not {dtype}" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to just do this inside the manager itself? I know this might mean a slight refactoring here, but it would be better if we could do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically this code does two steps for
cond
: 1) align and 2) fillna.The way that I wrote this now, those are a bit intertwined (I check something about the dataframe that is need for fillna before the alignment step), but I don't think we want to push the alignment inside the manager? (that's something that now is always done outside I think, the manager doesn't need to care about that)
As mentioned below, I could call
cond._mgr.fillna(..)
instead ofcond.fillna()
, and then add a keyword whether to force it to be boolean or not (what is now encoded inall_bool_columns
).But, in that idea, the code to determine
all_bool_columns
would still live here, and thus would only move part of the added complexity into the internals.