-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New defaults for concat
, merge
, combine_*
#10062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hard-coded these to the old defaults since there is no way for the user to set them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
0e65034
to
5461a9f
Compare
The last test file that I need to work on is test_concat.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
xarray/tests/test_merge.py
Outdated
assert_identical(ds2, actual) | ||
|
||
actual = ds2.merge(ds1) | ||
actual = ds2.merge(ds1, compat="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is dangerous, the dimensionality of 'x'
in the output depends on the order of merging.
I intended compat="override"
to be used when "x"
is the same in all datasets, not merely 'compatible' as is here. It seems like we would want to at minimum, assert that ndim
is same for all x
in all datasets.
Opened #10094
xarray/core/concat.py
Outdated
compat, | ||
positions, | ||
dim=dim, | ||
data_vars="all", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is unsettable - I hardcoded it to the old default.
The option is part of the deprecation process. Basically how it works is in this PR the default values for these kwargs do not change. BUT you get This is how I'm thinking about the plan after this PR lands:
I guess you could get rid of the option the tradeoff is that to get rid of the |
This is ready for review! The failing test looks like it is also failing on main. |
Wellll maybe the unit tests will pass now. I'll fix mypy and the doctests next week. |
The mypy failures now match those in other PRs - I opened #10110 to track |
|
||
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or | ||
:py:class:`~xarray.DataArray` object as well as a string, in which case it is | ||
used to label the values along the new dimension: | ||
|
||
.. ipython:: python | ||
|
||
xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim")) | ||
xr.concat([da0, da1], pd.Index([-90, -100], name="new_dim")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@jsignell It looks like this is close to finish. Would you mind fixing the conflicts? |
Sure! I'll take a look later today! |
Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org>
for more information, see https://pre-commit.ci
I'm working on the mypy errors |
Looks like only flaky tests left! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a few things, and I'll have to revisit the ignored warnings in the units tests at some point, but for now these can stay (I'll fix in a separate PR). Or maybe we should just add a module fixture that opts into the new behavior?
The flaky CI passed after rerunning, so we're all good there.
da0 = da.isel(x=0).drop_vars("x") | ||
da1 = da.isel(x=1).drop_vars("x") | ||
|
||
xr.concat([da0, da1], "new_dim") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we keep this as suggested in the PR I'd go with
da0 = da.isel(x=0, drop=True)
da0 = da.isel(x=1, drop=True)
data_vars: Literal["all", "minimal", "different"] | ||
| list[str] | ||
| CombineKwargDefault = _DATA_VARS_DEFAULT, | ||
coords=_COORDS_DEFAULT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know anything about the context and I'm really bad at typing (so feel free to disregard / punt to a different PR), but shouldn't coords
have the same type hints as data_vars
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably? I was trying to limit the scope of this PR as much as possible, since it's already pretty big. So I would prefer to punt this. When you add types there is always the possibility of breaking a bunch of stuff...
xarray/structure/combine.py
Outdated
@@ -338,18 +352,21 @@ def _new_tile_id(single_id_ds_pair): | |||
|
|||
def _nested_combine( | |||
datasets, | |||
concat_dims, | |||
concat_dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we ever see multiple dims here (as a list)? If so we might have to reconsider the rename, otherwise the condition where we wrap concat_dim
in a list could just always happen? We do support hashables in most places (but are not really consistent right now), so a tuple dimension name could break things (not sure, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you can have multiple dims here. I guess I was pushing the handling of the single string value into this function, but I can move it back up to a higher level or just change the name to concat_dims
. I honestly don't totally remember why I thought it was better to do wrap the input in a list at this level rather than higher up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I just saw the string -> list handling in both of the places where _nested_combine
is used, so that's why I moved it down into this function.
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
Replaces #10051
FutureWarnings
whats-new.rst
New functions/methods are listed inapi.rst
This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.
With new kwarg values
use_new_combine_kwarg_defaults=True
-> 78 faileduse_new_combine_kwarg_defaults=False
and run the last failed -> 76 failed, 2 passedThose 2 are missed alarms. In these two cases the behavior will be different when xarray switched to using new default kwarg values and there is no warning. But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.
With old kwarg values
use_new_combine_kwarg_defaults=False
-> 119 faileduse_new_combine_kwarg_defaults=True
and run the last failed -> 76 failed, 43 passedThose 44 are potentially false alarms - they could also be tests that just don't assert that the result exactly matches some fixed expectation - but mostly they are false alarms.
Here is a list of them
About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with
compat='no_conflicts'
you might get different results than withcompat='override'
. There might be a better way, but we want to be careful to avoid calling compute.Updating tests
After investigating the impact on existing tests, I updated the tests to make sure they still test what they were meant to test by passing in any required kwargs and I added new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.
Notes