Skip to content

Add option to choose the source of global attributes in mfdataset. #2382

Closed
@juseg

Description

@juseg

Code Sample

import numpy as np
import xarray as xr

# prepare fake data
time = np.arange(1000)
data = time**2

# write to multiple netcdf files
for i in range(10):
    filename = 'ds{:d}.nc'.format(i)
    ds = xr.Dataset({'data': (['time'],  data[100*i:100*i+100])},
                    coords={'time': time[100*i:100*i+100]},
                    attrs={'history': 'Created ' + filename + '.'})
    ds.to_netcdf(filename)

# open as mfdataset
with xr.open_mfdataset('ds?.nc') as ds:
    print ds.history

Problem description

Currently, global attributes of multi-file datasets are taken from the first file in the list.

combined.attrs = datasets[0].attrs

I think this is a problem in the context of consecutive model runs where history is appended in each subsequent run. When opening the results as mfdataset, history is taken from the first run file all subsequent history is lost.

NetCDF4 has a new keyword argument to set the master_file in a MFDataset (Unidata/netcdf4-python#835). Would it be possible to add a similar option in xarray?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Linux OS-release: 4.14.65-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CH.utf8 LOCALE: None.None

xarray: 0.10.8
pandas: 0.23.4
numpy: 1.15.0
scipy: 1.1.0
netCDF4: 1.4.2
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.18.2
distributed: None
matplotlib: 2.2.3
cartopy: 0.15.1
seaborn: None
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: None
IPython: 5.8.0
sphinx: 1.7.6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions