-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
DataArray.isin does not accept sets intuitively #10022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with you. This is the issue really: >>> import numpy as np
>>> np.array({1, 2, 3})
array({1, 2, 3}, dtype=object) IMO special-casing a set makes sense to me. |
Agree! We would take a PR... |
Sure thing, I can make a PR! |
Can we make breaking changes? Pandas performs an Personally, I really like this. |
If worthwhile! But with a fairly high bar. To the extent you're proposing the pandas approach: what's the difference between "special-case a set" and that? |
Maybe breaking change is not the correct term, but currently the xarray isin works as follows: ints = xr.DataArray([1,2,3,4])
strs = xr.DataArray(["a", "ab", "abc"])
ints.isin(1)
# array([ True, False, False, False])
ints.isin(np.array(1))
# array([ True, False, False, False])
ints.isin({1,2})
# array([ False, False, False, False])
ints.isin(xr.DataArray([1, 2]))
# array([ True, True, False, False])
ints.isin(xr.DataArray([1, 2]).to_dataset(name="hello"))
# TypeError
ints.isin(xr.DataArray(1))
# array([ True, False, False, False])
strs.isin("abc")
# array([ False, False, True])
strs.isin(["abc"])
# array([ False, False, True])
strs.isin({"a", "b"})
# array([ False, False, False]) and we could consider following the pandas implementation: ints = xr.DataArray([1,2,3,4])
strs = xr.DataArray(["a", "ab", "abc"])
ints.isin(1)
# TypeError / ValueError
ints.isin(np.array(1))
# ValueError, it is a scalar
ints.isin({1,2})
# array([ True, True, False, False])
ints.isin(xr.DataArray([1, 2]))
# array([ True, True, False, False])
ints.isin(xr.DataArray([1, 2]).to_dataset(name="hello"))
# TypeError
ints.isin(xr.DataArray(1))
# ValueError (also a scalar)
strs.isin("abc")
# ValueError / TypeError
strs.isin(["abc"])
# array([ False, False, True])
strs.isin({"a", "b"})
# array([ True, True, False]) So the main difference between special casing a set and following the pandas approach is a much stricter validation is the value passed is list-like. Anything that is not list-like (or set-like) will result in a ValueError. |
OK, so aside from sets, the difference is in the first few IIUC — if pandas is passed something like a scalar or array, it'll raise an error. I somewhat understand scalars. I don't immediately see why refusing arrays is better. So in answer to your question above, I don't see adding strictness for an abstract reason is sufficient for a breaking change; we'd need to show if it was actively confusing or causing pain in some way — and only the sets case seems to pass that test... |
It will accept regular arrays. It will just not accept numpy scalars. I believe it checks that
The only other one that I find actively confusing is the string one strs.isin("abc")
# is not the same as
strs.isin(list("abc")) Anyhow, thanks for the input, it should just be a small change then :-) |
Ah great, thanks, I missed that. That weighs me a bit towards disallowing it. But probably not enough, unless others have views... |
I like the idea of raising for a string, just like the other scalars. We can do a short deprecation cycle for that one specifically. |
What is your issue?
This is halfway between an feature request and a bug report. I do not find the way that DataArray.isin handles sets is very intuitive:
Example:
Which results in:
Personally, I find only pandas handles this in an intuitive way. But I also understand, that you might want to stay consistent with numpy and dask.
I'd be interested to see what you think!
The text was updated successfully, but these errors were encountered: