Skip to content

Error in string data type handling with 0 fill value #2792

Open
@LDeakin

Description

@LDeakin

Zarr version

3.0.1

Numcodecs version

Python Version

3.12

Operating System

Linux

Installation

see reproducer

Description

There are Zarr V2 string arrays in the wild with a 0 fill value. zarr-python interprets this as an empty string (e.g. when partially writing a chunk), but a completely missing chunk returns 0's rather than ""s. See reproducer.

Steps to reproduce

#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "zarr==3.0.1",
# ]
# ///

import zarr

array = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    dtype=str,
    shape=(5,),
    chunks=(2,),
    filters=zarr.codecs.vlen_utf8.VLenUTF8(),
    compressors=[None],
    fill_value=0,
    zarr_format=2,
    overwrite=True,
)
array[:3] = ["a", "bb", ""]
print(array.info)
# assert (array[:] == ["a", "bb", "", "", ""]).all() # EXPECTED
# array[:] is ["a", "bb", "", "", 0]

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions