Skip to content

Concatenate across multiple dimensions with open_mfdataset #2159

Closed
@TomNicholas

Description

@TomNicholas

Code Sample

# Create 4 datasets containing sections of contiguous (x,y) data
for i, x in enumerate([1, 3]):
    for j, y in enumerate([10, 40]):
        ds = xr.Dataset({'foo': (('x', 'y'), np.ones((2, 3)))},
                         coords={'x': [x, x+1],
                                 'y': [y, y+10, y+20]})

        ds.to_netcdf('ds.' + str(i) + str(j) + '.nc')

# Try to open them all in one go
ds_read = xr.open_mfdataset('ds.*.nc')
print(ds_read)

Problem description

Currently xr.open_mfdataset will detect a single common dimension and concatenate DataSets along that dimension. However a common use case is a set of NetCDF files which have two or more common dimensions that need to be concatenated along simultaneously (for example collecting the output of any large-scale simulation which parallelizes in more than one dimension simultaneously). For the behaviour of xr.open_mfdataset to be n-dimensional it should automatically recognise and concatenate along all common dimensions.

Expected Output

<xarray.Dataset>
Dimensions:  (x: 4, y: 6)
Coordinates:
  * x        (x) int64 1 2 3 4
  * y        (y) int64 10 20 30 40 50 60
Data variables:
    foo      (x, y) float64 dask.array<shape=(4, 6), chunksize=(2, 3)>

Current output of xr.open_mfdataset()

<xarray.Dataset>
Dimensions:  (x: 4, y: 12)
Coordinates:
  * x        (x) int64 1 2 3 4
  * y        (y) int64 10 20 30 40 50 60 10 20 30 40 50 60
Data variables:
    foo      (x, y) float64 dask.array<shape=(4, 12), chunksize=(4, 3)>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions