Description
Problem description
Attributes of Dataset coordinates are dropped or replaced when adding a DataArray with dimensions or coordinates that already exist in the Dataset. In addition the order of the Dataset's coordinates can change by adding a DataArray.
Expected Behaviour
Attributes of Dataset coordinates should not be altered by adding a DataArray to the Dataset, and the order of existing coordinates should be preserved.
More details and code examples
The following code shows the behaviour by adding new data variables to a Dataset using a tuple
, a DataArray
(dimension without coordinates), and a Variable
.
import numpy as np
import xarray as xr
ds = xr.Dataset(
coords={
'x': ('x', np.arange(10, 20), {'meta': 'foo'}),
'y': ('y', np.arange(20, 30), {'meta': 'bar'}),
'z': ('z', np.arange(30, 40), {'meta': 'baz'})})
print(ds, end='\n\n')
ds.info()
print('\n\n====\n')
ds['a'] = 'x', np.arange(10)
ds['b'] = xr.DataArray(np.arange(10), dims='y')
ds['c'] = xr.Variable('z', np.arange(10))
print(ds, end='\n\n')
ds.info()
Output
<xarray.Dataset>
Dimensions: (x: 10, y: 10, z: 10)
Coordinates:
* x (x) int64 10 11 12 13 14 15 16 17 18 19
* y (y) int64 20 21 22 23 24 25 26 27 28 29
* z (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
*empty*
xarray.Dataset {
dimensions:
x = 10 ;
y = 10 ;
z = 10 ;
variables:
int64 x(x) ;
x:meta = foo ;
int64 y(y) ;
y:meta = bar ;
int64 z(z) ;
z:meta = baz ;
// global attributes:
}
====
<xarray.Dataset>
Dimensions: (x: 10, y: 10, z: 10)
Coordinates:
* y (y) int64 20 21 22 23 24 25 26 27 28 29
* x (x) int64 10 11 12 13 14 15 16 17 18 19
* z (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
a (x) int64 0 1 2 3 4 5 6 7 8 9
b (y) int64 0 1 2 3 4 5 6 7 8 9
c (z) int64 0 1 2 3 4 5 6 7 8 9
xarray.Dataset {
dimensions:
x = 10 ;
y = 10 ;
z = 10 ;
variables:
int64 y(y) ;
int64 x(x) ;
x:meta = foo ;
int64 z(z) ;
z:meta = baz ;
int64 a(x) ;
int64 b(y) ;
int64 c(z) ;
// global attributes:
The output shows that the attributes and the order of the Dataset's coordinates are preserved (as expected) when adding data variables using a tuple
or a Variable
, but when using a DataArray
instead the attributes are dropped for the related coordinates, and the ordering of the Dataset's coordinates is changed.
When adding DataArrays with coordinates to the Dataset, the attributes of the affected Dataset coordinates are replaced with the attributes of the DataArray's coordinates:
d = xr.DataArray(
np.arange(10),
coords=[('x', np.arange(10, 20), {'breakfast': 'eggs'})])
e = xr.DataArray(
np.arange(10),
coords=[('z', np.arange(40, 50), {'breakfast': 'spam'})])
print('d.x =', d.x, end='\n\n')
print('e.z =', e.z, end='\n\n')
ds['d'] = d
ds['e'] = e
print(ds, end='\n\n')
ds.info()
Output
d.x = <xarray.DataArray 'x' (x: 10)>
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
Coordinates:
* x (x) int64 10 11 12 13 14 15 16 17 18 19
Attributes:
breakfast: eggs
e.z = <xarray.DataArray 'z' (z: 10)>
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
Coordinates:
* z (z) int64 40 41 42 43 44 45 46 47 48 49
Attributes:
breakfast: spam
<xarray.Dataset>
Dimensions: (x: 10, y: 10, z: 10)
Coordinates:
* z (z) int64 30 31 32 33 34 35 36 37 38 39
* y (y) int64 20 21 22 23 24 25 26 27 28 29
* x (x) int64 10 11 12 13 14 15 16 17 18 19
Data variables:
a (x) int64 0 1 2 3 4 5 6 7 8 9
b (y) int64 0 1 2 3 4 5 6 7 8 9
c (z) int64 0 1 2 3 4 5 6 7 8 9
d (x) int64 0 1 2 3 4 5 6 7 8 9
e (z) float64 nan nan nan nan nan nan nan nan nan nan
xarray.Dataset {
dimensions:
x = 10 ;
y = 10 ;
z = 10 ;
variables:
int64 z(z) ;
z:breakfast = spam ;
int64 y(y) ;
int64 x(x) ;
x:breakfast = eggs ;
int64 a(x) ;
int64 b(y) ;
int64 c(z) ;
int64 d(x) ;
float64 e(z) ;
// global attributes:
This even happens for the DataArray e
in the example above which has a common dimension 'z'
with the Dataset ds
, but different coordinate values. In this case the data and coordinate values are handled as one would expect: The ds.e
array is filled with NaNs (because the coordinate values do not match), and the ds.z
coordinate values are not replaced by the DataArray's e.z
coordinate values. But the attributes of the Dataset's coordinates (ds.z.attrs
) are still replaced by the attributes of the DataArray's coordinates (e.z.attrs
).
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.2-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.7
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.1.0
pip: 10.0.1
conda: None
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4