Skip to content

Xarray does not support full range of netcdf-python compression options #7388

Closed
@rabernat

Description

@rabernat

What is your issue?

Summary

The netcdf4-python API docs say the following

If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.

Although compression is considered a valid encoding option by Xarray

valid_encodings = {
"zlib",
"complevel",
"fletcher32",
"contiguous",
"chunksizes",
"shuffle",
"_FillValue",
"dtype",
"compression",
}

...it appears that we silently ignores the compression option when creating new netCDF4 variables:

nc4_var = self.ds.createVariable(
varname=name,
datatype=datatype,
dimensions=variable.dims,
zlib=encoding.get("zlib", False),
complevel=encoding.get("complevel", 4),
shuffle=encoding.get("shuffle", True),
fletcher32=encoding.get("fletcher32", False),
contiguous=encoding.get("contiguous", False),
chunksizes=encoding.get("chunksizes"),
endian="native",
least_significant_digit=encoding.get("least_significant_digit"),
fill_value=fill_value,
)

Code example

shape = (10, 20)
chunksizes = (1, 10)

encoding = {
    'compression': 'zlib',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': False,
    'contiguous': False,
    'chunksizes': chunksizes
}

da = xr.DataArray(
    data=np.random.rand(*shape),
    dims=['y', 'x'],
    name="foo",
    attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()

fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1:
    display(ds1.foo.encoding)
{'zlib': False,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 0,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 10),
 'source': 'test.nc',
 'original_shape': (10, 20),
 'dtype': dtype('float64'),
 '_FillValue': nan}

In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).

Proposal

We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions