Description
Hello,
I am not sure if this is a bug or a feature but when one calls the xarray.coarsen() on a dataset, then the attributes get removed.
Dataset Example
import xarray as xr
import numpy as np
var1 = np.linspace(10, 15, 100)
var2 = np.linspace(5, 10, 100)
coords = np.linspace(1, 10, 100)
dat = xr.Dataset(
data_vars={'var1': ('coord', var1), 'var2': ('coord', var2)},
coords={'coord': coords}
)
dat.attrs['model_id'] = 'model1'
# coarsen dataset
dat = dat.coarsen(coord=5).mean()
# print dataset
dat
Actual Output
<xarray.Dataset>
Dimensions: (coord: 20)
Coordinates:
* coord (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
var1 (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
var2 (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899
Expected Output
<xarray.Dataset>
Dimensions: (coord: 20)
Coordinates:
* coord (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
var1 (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
var2 (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899
Attributes:
model_id: 1
Problem Description
I believe the attributes should stay within the xarray.Dataset no matter what the operations that are done on it. Obviously maybe for some operations an entry like model_id could change because it's no longer the model. But I believe that should be left up to the user. Perhaps a warning in the docs might be sufficient. The behaviour isn't consistent with the xarray.coarsen() function on the xarray.DataArray example where the attributes remain the same (see details below).
DataArray Example
data = np.random.rand(50, 3)
locs = ['IA', 'IL', 'IN']
times = pd.date_range('2000-01-01', periods=50)
foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space'])
foo.attrs['data_id'] = 'data1'
foo
Expected/Actual Output
<xarray.DataArray (time: 10, space: 3)>
array([[0.3537571 , 0.50698482, 0.35923528],
[0.62127828, 0.41852822, 0.5617278 ],
[0.38669858, 0.60446037, 0.45699182],
[0.41538186, 0.81251298, 0.3919821 ],
[0.67914214, 0.45866817, 0.58625095],
[0.63560785, 0.53796635, 0.48231731],
[0.60802724, 0.54003065, 0.38456255],
[0.46492592, 0.78542293, 0.50788668],
[0.53757801, 0.56765902, 0.52288412],
[0.51085502, 0.51448292, 0.67426125]])
Coordinates:
* time (time) datetime64[ns] 2000-01-03 2000-01-08 ... 2000-02-17
* space (space) <U2 'IA' 'IL' 'IN'
Attributes:
data_id: data1
Output of xr.show_versions()
xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.21
cfgrib: None
iris: None
bottleneck: None
dask: 2.4.0
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None