Description
What happened:
Opened a dataset using open_zarr
, sliced the dataset, and then tried to resave to a zarr store using to_zarr
.
What you expected to happen:
The file would save without needing to explicitly modify any encoding
dictionary values
Minimal Complete Verifiable Example:
ds = xr.Dataset({"data": (("dimA", ), [10, 20, 30, 40])}, coords={"dimA": [1, 2, 3, 4]})
ds = ds.chunk({"dimA": 2})
ds.to_zarr("test.zarr", consolidated=True, mode="w")
ds2 = xr.open_zarr("test.zarr", consolidated=True).sel(dimA=[1,3]).persist()
ds2.to_zarr("test2.zarr", consolidated=True, mode="w")
This raises:
NotImplementedError: Specified zarr chunks encoding['chunks']=(2,) for variable named 'data' would overlap multiple dask chunks ((1, 1),). This is not implemented in xarray yet. Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`.
Anything else we need to know?:
Not sure if there is a good way around this (or perhaps this is even desired behavior?), but figured I would flag it as it seemed unexpected and took us a second to diagnose. Once you've loaded the data from a zarr store, I feel like the default behavior should probably be to forget the encodings used to save that zarr, treating the in-memory dataset object just like any other in-memory dataset object that could have been loaded from any source. But maybe I'm in the minority or missing some nuance about why you'd want the encoding to hang around.
Environment:
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.89+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.17.0
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.6.2
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.7.1
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.2.2
cfgrib: 0.9.9.0
iris: 3.0.1
bottleneck: 1.3.2
dask: 2021.04.1
distributed: 2021.04.1
matplotlib: 3.4.1
cartopy: 0.19.0
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.4