Description
What is your issue?
This is an issue to track compatibility between xarray and the in-progress zarr-python data types refactoring effort.
We are working on a new data type model for zarr-python. Why? Zarr-python 2 used numpy dtypes internally, and zarr v2 (the format) also used the numpy data type model. Fitting the spec heavily to numpy proved problematic for zarr implementations in other languages.
Zarr v3 introduced a new data type model that looks much less like numpy dtypes. The v3 spec defines fewer dtypes than numpy supports, for example, and the v3 dtypes model doesn't track endianness. So we shipped zarr-python 3 with zarr v3 support for only the data types described in the zarr v3 spec, which left out some important numpy data types:
type | string code | zarr v3 spec |
---|---|---|
fixed-length ascii strings | S |
PR |
fixed-length unicode strings | U |
PR |
datetime64 | M |
numpy.datetime64 |
timedelta64 | m |
numpy.timedelta64 |
fixed-length raw byes | V |
None yet |
structured data types | V |
None yet |
Support for these missing numpy data types is being added in this PR against zarr-python. It's turned into quite an effort. In parallel with the zarr-python implementation, we are also writing up language-agnostic specs for these data types, so that other zarr implementations can easily support them. See the third column of the table.
I opened a compatibility PR against xarray that sources zarr-python from the new dtypes branch. When the compatibility PR indicates that all tests are passing, and when we are satisfied that there are no remaining questions relating to the impact of zarr-python's new dtype model and xarray, then we can close this issue.
We are looking to release this functionality in zarr-python 3.1, but I can't give a timeline for that yet. Until then, I'm happy to answer any questions people have about this effort.