Closed
Description
This might be just a documentation issue, so sorry if this is not a problem with xarray.
I'm trying to save an intermediate result of a calculation with xarray + dask to disk, but I'd like to preserve the on-disk chunking. Setting the encoding of a Dataset.data_var or DataArray using the encoding attribute seems to work for (at least) some encoding variables, but not for chunksizes
. For example:
import xarray as xr
import dask.array as da
from dask.distributed import Client
from IPython import embed
# First generate a file with random numbers
rng = da.random.RandomState()
shape = (10, 10000)
chunks = [10, 10]
dims = ['x', 'y']
z = rng.standard_normal(shape, chunks=chunks)
da = xr.DataArray(z, dims=dims, name='z')
# Set encoding of the DataArray
da.encoding['chunksizes'] = chunks # Not conserved
da.encoding['zlib'] = True # Conserved
ds = da.to_dataset()
print(ds['z'].encoding) #out: {'chunksizes': [10, 10], 'zlib': True}
# This one is chunked and compressed correctly
ds.to_netcdf('test1.nc', encoding={'z': {'chunksizes': chunks}})
# While this one is only compressed
ds.to_netcdf('test2.nc')
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.5-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL:
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.3
scipy: 0.19.0
netCDF4: 1.4.0
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.0.2
cartopy: None
seaborn: 0.7.1
setuptools: 39.1.0
pip: 9.0.1
conda: None
pytest: 3.2.2
IPython: 6.3.1
sphinx: None