-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New defaults for concat
, merge
, combine_*
#10062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hard-coded these to the old defaults since there is no way for the user to set them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
0e65034
to
5461a9f
Compare
The last test file that I need to work on is test_concat.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
xarray/tests/test_merge.py
Outdated
assert_identical(ds2, actual) | ||
|
||
actual = ds2.merge(ds1) | ||
actual = ds2.merge(ds1, compat="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is dangerous, the dimensionality of 'x'
in the output depends on the order of merging.
I intended compat="override"
to be used when "x"
is the same in all datasets, not merely 'compatible' as is here. It seems like we would want to at minimum, assert that ndim
is same for all x
in all datasets.
Opened #10094
|
||
assert_identical(old, new) | ||
|
||
def test_merge(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test.
xarray/core/concat.py
Outdated
compat, | ||
positions, | ||
dim=dim, | ||
data_vars="all", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is unsettable - I hardcoded it to the old default.
The option is part of the deprecation process. Basically how it works is in this PR the default values for these kwargs do not change. BUT you get This is how I'm thinking about the plan after this PR lands:
I guess you could get rid of the option the tradeoff is that to get rid of the |
This is ready for review! The failing test looks like it is also failing on main. |
Wellll maybe the unit tests will pass now. I'll fix mypy and the doctests next week. |
The mypy failures now match those in other PRs - I opened #10110 to track |
da0 = da.isel(x=0).drop_vars("x") | ||
da1 = da.isel(x=1).drop_vars("x") | ||
|
||
xr.concat([da0, da1], "new_dim") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping the overlapping "x" means that you don't get a future warning anymore and the outcome won't change with the new defaults. It seemed to me like it was maintaining the spirit of the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change to xr.concat([da.isel(x=[0]), da.isel(x=[1])], dim="new_dim")
. I think that preserves the spirit, and gets users closer to what we'd like them to type and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one will give a FutureWarning
about how join is going to change:
In [3]: xr.concat([da.isel(x=[0]), da.isel(x=[1])], "new_dim")
<ipython-input-3-8d3fee24c8e4>:1: FutureWarning: In a future version of xarray the default value for join will change from join='outer' to join='exact'. This change will result in the following ValueError:cannot be aligned with join='exact' because index/labels/sizes are not equal along these coordinates (dimensions): 'x' ('x',) The recommendation is to set join explicitly for this case.
xr.concat([da.isel(x=[0]), da.isel(x=[1])], "new_dim")
Out[3]:
<xarray.DataArray (new_dim: 2, x: 2, y: 3)> Size: 96B
array([[[ 0., 1., 2.],
[nan, nan, nan]],
[[nan, nan, nan],
[ 3., 4., 5.]]])
Coordinates:
* x (x) <U1 8B 'a' 'b'
* y (y) int64 24B 10 20 30
Dimensions without coordinates: new_dim
We can add an explicit join value to get rid of the warning or we can allow the docs to build with the warning (I think that is not a good idea because warnings in docs might scare people)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared with that with the example as it is on main:
In [3]: xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
<ipython-input-8-5e17a4052d18>:1: FutureWarning: In a future version of xarray the default value for coords will change from coords='different' to coords='minimal'. This is likely to lead to different results when multiple datasets have matching variables with overlapping values. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set coords explicitly.
xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
Out[3]:
<xarray.DataArray (new_dim: 2, y: 3)> Size: 48B
array([[0, 1, 2],
[3, 4, 5]])
Coordinates:
x (new_dim) <U1 8B 'a' 'b'
* y (y) int64 24B 10 20 30
Dimensions without coordinates: new_dim
|
||
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or | ||
:py:class:`~xarray.DataArray` object as well as a string, in which case it is | ||
used to label the values along the new dimension: | ||
|
||
.. ipython:: python | ||
|
||
xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim")) | ||
xr.concat([da0, da1], pd.Index([-90, -100], name="new_dim")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Replaces #10051
FutureWarnings
whats-new.rst
New functions/methods are listed inapi.rst
This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.
With new kwarg values
use_new_combine_kwarg_defaults=True
-> 78 faileduse_new_combine_kwarg_defaults=False
and run the last failed -> 76 failed, 2 passedThose 2 are missed alarms. In these two cases the behavior will be different when xarray switched to using new default kwarg values and there is no warning. But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.
With old kwarg values
use_new_combine_kwarg_defaults=False
-> 119 faileduse_new_combine_kwarg_defaults=True
and run the last failed -> 76 failed, 43 passedThose 44 are potentially false alarms - they could also be tests that just don't assert that the result exactly matches some fixed expectation - but mostly they are false alarms.
Here is a list of them
About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with
compat='no_conflicts'
you might get different results than withcompat='override'
. There might be a better way, but we want to be careful to avoid calling compute.Updating tests
After investigating the impact on existing tests, I updated the tests to make sure they still test what they were meant to test by passing in any required kwargs and I added new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.
Notes