Closed
Description
Zarr version
v3
Numcodecs version
n/a
Python Version
n/a
Operating System
n/a
Installation
n/a
Description
As part of getting xarray ready for zarr v3, I'm looking at how to handle the codec and filter API.
The primary / first place this is accessed is https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L555-L556, which just reads the values of .filters
and .compressor
to place them in the DataArray.encoding
. A few questions:
- I'd like to add a
.codecs
property to theCodecPipeline
ABC. This is fine for theBatchedCodecPipeline
which AFAICT is the only actual codec pipeline. Does anyone foresee an issue with that? I'm not sure why that class is abstract and loadable through the config. - Is it fair to say that
filters
is the same asarray_array_codecs
? - Is it fair to say that
compressor
is the same asarray_bytes_codecs
?
There's also https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L79, which accesses Codec.codec_id
. I'm not sure yet about how to handle that, but right now the best is maybe .to_dict()["name"]
(or we could have .to_dict()
access a code_id)?
Steps to reproduce
n/a
Additional output
No response