Description
What is your issue?
Summary
The netcdf4-python API docs say the following
If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently
zlib
,szip
,zstd
,bzip2
,blosc_lz
,blosc_lz4
,blosc_lz4hc
,blosc_zlib
andblosc_zstd
are supported. Default is None (no compression). All of the compressors exceptzlib
andszip
use the HDF5 plugin architecture.If the optional keyword
zlib
is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor ofcompression='zlib'
.
Although compression
is considered a valid encoding option by Xarray
xarray/xarray/backends/netCDF4_.py
Lines 232 to 242 in bbe63ab
...it appears that we silently ignores the compression
option when creating new netCDF4 variables:
xarray/xarray/backends/netCDF4_.py
Lines 488 to 501 in bbe63ab
Code example
shape = (10, 20)
chunksizes = (1, 10)
encoding = {
'compression': 'zlib',
'shuffle': True,
'complevel': 8,
'fletcher32': False,
'contiguous': False,
'chunksizes': chunksizes
}
da = xr.DataArray(
data=np.random.rand(*shape),
dims=['y', 'x'],
name="foo",
attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()
fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")
with xr.open_dataset(fname, engine="netcdf4") as ds1:
display(ds1.foo.encoding)
{'zlib': False,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 0,
'fletcher32': False,
'contiguous': False,
'chunksizes': (1, 10),
'source': 'test.nc',
'original_shape': (10, 20),
'dtype': dtype('float64'),
'_FillValue': nan}
In addition to showing that compression
is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip
, zstd
, bzip2
, blosc
).
Proposal
We should align with the recommendation from the netcdf4 docs and support compression=
style encoding in NetCDF. We should deprecate zlib=True
syntax.