-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert DataArray to DataSet before combine #3312
base: main
Are you sure you want to change the base?
Conversation
Hello @friedrichknuth! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-09-17 21:05:00 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @friedrichknuth! Welcome to xarray
This looks like a good start.
-
Can you extend these changes to the other combine functions too please?
-
You'll also need to add tests to
test_combine.py
. We have some documentation on that here: https://xarray.pydata.org/en/stable/contributing.html#using-pytest
The example code in the issues and this PR should be a good start for the tests. -
You should add a line to
whats-new.rst
and take credit for your work.
fails with:
It looks like the existing combine_nested() routine actually wants a DataArray and fails if passed a DataSet. The following should work with current master.
While converting to DataSet will cause the same error expressed by the test.
|
Opened #3315 regarding combine_nested() failing when being passed nested DataSets. |
Thinking about the interface here, would it make sense for the combine methods to always return an object of the same type as the inputs? E.g., list of DataArray -> DataArray, list of Dataset -> Dataset? Effectively, we would disable the automatic merging when working with DataArray objects and only do concat (ignoring DataArray names). I guess the confusing part is that in concat we always return the same type as the inputs, vs merge where we always return a Dataset. And of course, combine can do both! |
I don't think so. Even for an input of only DataArrays, depending on the actual names and values in the DataArrays, the result of a combine could be a DataArray or a Dataset. So would it not it be simpler to
That way the result is always the simplest object that can hold the result of combining that particular set of inputs, and the combining internals only have to handle DataSet objects. EDIT: Oh wait but that won't work in the case of unnamed DataArrays right? |
I agree.
I think the simplest solution is to always return |
If we gave all the DataArray objects the same name when converted into Dataset objects, then I think the result could always be a single DataArray? This would be consistent with how we do arithmetic with a DataArrays: the names get ignored, and then we assign a name to the result only if there are no conflicting names on the inputs. |
I suppose so, but this seems like an odd way to handle it to me. You're throwing away data (the names) which in other circumstances would be used.
Do we want consistency with arithmetic, or consistency with Let me try to clarify by summarizing. Currently, da1 = xr.DataArray(name='foo',
data=np.random.rand(3,3),
coords=[('x', [1, 2, 3]),
('y', [1, 2, 3])])
da2 = xr.DataArray(name='foo2',
data=np.random.rand(3,3),
coords=[('x', [5, 6, 7]),
('y', [5, 6, 7])])
merge([da1, da2]) and da1 = xr.DataArray(name='foo',
data=np.random.rand(3,3),
coords=[('x', [1, 2, 3]),
('y', [1, 2, 3])])
da2 = xr.DataArray(name='foo2',
data=np.random.rand(3,3),
coords=[('x', [5, 6, 7]),
('y', [5, 6, 7])])
xr.combine_nested([da1, da2], concat_dim=None) both return
This all makes intuitive sense to me.
However, as shown above, This is all different to the arithmetic logic, but I think it makes way more intuitive sense. It's okay for arithmetic and combining logic to be different, as they are used in different contexts and it's an unambiguous delineation to ignore names in arithmetic, and use them in top-level combining functions. Also, to complete the consistency of the "combining" functions, I think we should make In short: I propose that "combining" isn't arithmetic, and should be treated separately (and consistently across all types of combine functions). |
Also separately I think I've discovered a related weird bug: da1 = xr.DataArray([0], name='a')
da2 = xr.DataArray([1], name='b')
xr.combine_by_coords([da1, da2]) returns
which it shouldn't do, it should have just immediately failed because there are no dimension coordinates, and otherwise single-element arrays should obviously behave the same as the other examples above. I think it's because it's reading the single element data |
@shoyer I just re-encountered this bug whilst doing actual work - I would like to get it fixed, but need your input. In particular on
(if this is particularly complex we could perhaps discuss it in the bi-weekly meeting?) |
Enables combine_by_coords on DataArrays. Will convert DataArray to DataSet before proceeding.
As mentioned in #3248, this will still fail if the DataArray is unnamed, but at least the error message tells the user why.
Previously, combining both named
and unnamed DataArrays
failed with
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
With this PR, combining the named DataArrays results in a combined DataSet, while the latter example will result in
ValueError: unable to convert unnamed DataArray to a Dataset without providing an explicit name