Open
Description
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Mentioned in #2323 (comment), right now we can't create a fixed-width string dtype in zarr v3.
In [1]: import zarr
In [2]: arr = zarr.create(shape=(3,), dtype="U3")
In [3]: arr[:] = ['a', 'bb', 'ccc']
In [4]: arr[:]
Out[4]: array(['a', 'bb', 'ccc'], dtype=StringDType())
We would want the NumPy dtype of that array to be U3
, a fixed-width unicode string dtype. We'd want to support this in addition to the variable width strings being used currently. Some initial questions I don't know the answer to:
- What
data_type
shows up in the metadata? - What codecs are needed?
- How are the actual bytes stored? In parquet, fixed_len_byte_array is one of the primitive types.
Steps to reproduce
.
Additional output
No response
Metadata
Metadata
Assignees
Type
Projects
Status
Todo