Skip to content

.coarsen() method for the xarray.Dataset removes its attributes. #3376

Closed
@jejjohnson

Description

@jejjohnson

Hello,

I am not sure if this is a bug or a feature but when one calls the xarray.coarsen() on a dataset, then the attributes get removed.

Dataset Example

import xarray as xr
import numpy as np

var1 = np.linspace(10, 15, 100)
var2 = np.linspace(5, 10, 100)
coords = np.linspace(1, 10, 100)

dat = xr.Dataset(
    data_vars={'var1': ('coord', var1), 'var2': ('coord', var2)}, 
    coords={'coord': coords}
)
dat.attrs['model_id'] = 'model1'

# coarsen dataset
dat = dat.coarsen(coord=5).mean()

# print dataset
dat

Actual Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899

Expected Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899
Attributes:
    model_id:  1

Problem Description

I believe the attributes should stay within the xarray.Dataset no matter what the operations that are done on it. Obviously maybe for some operations an entry like model_id could change because it's no longer the model. But I believe that should be left up to the user. Perhaps a warning in the docs might be sufficient. The behaviour isn't consistent with the xarray.coarsen() function on the xarray.DataArray example where the attributes remain the same (see details below).

DataArray Example

data = np.random.rand(50, 3)
locs = ['IA', 'IL', 'IN']
times = pd.date_range('2000-01-01', periods=50)

foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space'])
foo.attrs['data_id'] = 'data1'
foo

Expected/Actual Output

<xarray.DataArray (time: 10, space: 3)>
array([[0.3537571 , 0.50698482, 0.35923528],
       [0.62127828, 0.41852822, 0.5617278 ],
       [0.38669858, 0.60446037, 0.45699182],
       [0.41538186, 0.81251298, 0.3919821 ],
       [0.67914214, 0.45866817, 0.58625095],
       [0.63560785, 0.53796635, 0.48231731],
       [0.60802724, 0.54003065, 0.38456255],
       [0.46492592, 0.78542293, 0.50788668],
       [0.53757801, 0.56765902, 0.52288412],
       [0.51085502, 0.51448292, 0.67426125]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-03 2000-01-08 ... 2000-02-17
  * space    (space) <U2 'IA' 'IL' 'IN'
Attributes:
    data_id:  data1

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-327.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.21
cfgrib: None
iris: None
bottleneck: None
dask: 2.4.0
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions