Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assign_coords with mixed DataArray / array args removes coords #3483

Open
keewis opened this issue Nov 4, 2019 · 5 comments
Open

assign_coords with mixed DataArray / array args removes coords #3483

keewis opened this issue Nov 4, 2019 · 5 comments

Comments

@keewis
Copy link
Collaborator

keewis commented Nov 4, 2019

I'm not sure if using assign_coords to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results:

>>> obj = xr.DataArray(
...     data=[6, 3, 4, 6],
...     coords={"x": list("abcd"), "y": ("x", range(4))},
...     dims="x",
... )
>>> obj
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'a' 'b' 'c' 'd'
    y        (x) int64 0 1 2 3
>>> # works as expected
>>> obj.assign_coords(coords={"x": list("efgh"), "y": ("x", [0, 2, 4, 6])})
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'
    y        (x) int64 0 2 4 6
>>> # works, too (same as .data / .values)
>>> obj.assign_coords(coords={
...     "x": obj.x.copy(data=list("efgh")).variable,
...     "y": ("x", [0, 2, 4, 6]),
... })
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'
    y        (x) int64 0 2 4 6
>>> # this drops "y"
>>> obj.assign_coords(coords={
...     "x": obj.x.copy(data=list("efgh")),
...     "y": ("x", [0, 2, 4, 6]),
... })
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'

Passing a DataArray for y, like obj.y * 2 while also changing x (the type does not matter) always results in a MergeError:

>>> obj.assign_coords(x=list("efgh"), y=obj.y * 2)
xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined:
first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x')
second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x')

I would expect the result to be the same regardless of the type of the new coords.

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2019

The x=DataArray example looks like a bug.

The last example raises a MergeError regardless of the order of x, y which is good I think. It's pretty ambiguous to change both the index x and the y=function(x) at the same time. It will only work if xarray uses magic and infers that everything but x must be assigned before assigning x.

@keewis
Copy link
Collaborator Author

keewis commented Nov 4, 2019

would it not be possible to always convert DataArray args using .variable? Unless I'm missing something, this should keep the information on the dimensions while also allowing to change the data independently:

obj.assign_coords(x=obj.x.copy(data=list("efgh")).variable, y=(obj.y * 2).variable)
obj.assign_coords(x=list("efgh"), y=(obj.y * 2).variable)

both work

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2019

That is a good point. Let's see what @shoyer thinks.

@shoyer
Copy link
Member

shoyer commented Nov 4, 2019

would it not be possible to always convert DataArray args using .variable? Unless I'm missing something, this should keep the information on the dimensions while also allowing to change the data independently:

It might make sense to change how assign_coords/ds.coords[k] = v] works, but in general we do want assign/ds[k] = v to add in coordinates from assigned DataArray objects rather than dropping them silently.

@keewis
Copy link
Collaborator Author

keewis commented Nov 6, 2019

agreed, my suggestion only applies to assign_coords, and even in that function it might make sense to keep the old behaviour unless the coord was explicitly set.

Edit: this could be fixed by changing how merge_coords and _get_priority_vars_and_indexes work, but they look like magic to me

keewis added a commit to keewis/xarray that referenced this issue Dec 5, 2019
max-sixty pushed a commit that referenced this issue Dec 9, 2019
* don't use the index to construct the condition array

* use drop_vars to drop variables and coordinates

* update the xfail reason for astype and item

* remove the xfails due to np.result_type not being implemented

* make array_extract_units a bit more robust

* add the missing dataset apply_ufunc test

* use None as the dict key for quantities

* convert to variable to avoid the merge / assign_coords issue

relevant issue: #3483

* update the align tests

* fix the bugs introduced by converting to variables

* update a few more tests

* update the aggregation tests

* update a few more tests

* don't test rank which was deprecated in numpy 1.9

* update the DataArray.fillna tests

* update most of the remaining DataArray tests

* remove a debug assert

* fix the broadcast_equals tests

* update the indexing tests

* update the tests depending on einsum

* update the squeeze tests

* fix a bug in the head / tail / thin test design

* use dictionaries to index in the loc tests

* update the expected unit of the computation tests

* update the grouped operations tests

* update the where tests

* update most of the remaining dataset tests

* create new tests for drop_sel

* final batch of updated tests

* rename result to actual to match the other test files

* fix some more test bugs

* update the xfail marks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants