Skip to content

Broken NaN encoding when writing v2 storage format from v3 library #2741

Closed
@d70-t

Description

@d70-t

Zarr version

3.0.1

Numcodecs version

0.15.0

Python Version

3.12.8

Operating System

Mac

Installation

micromamba create -n zarr3 python 'zarr>=3'

Description

When writing an float array with fill_value=np.nan to a version 2 store, according to the spec, the fill value has to be encoded as a string (i.e.: "NaN"). This has to be the case, as JSON doesn't support a NaN literal. However, when doing so using zarr>=3, the fill value is not encoded as string. The resulting .zarray is thus not a valid JSON and can't be read using other JSON parsers.

This behaviour could be a regession of #412.
The bug was originally found by @lkluft.

Steps to reproduce

Using this Python script as check_nan_encoding.py:

import zarr
import numpy as np

import numcodecs
import sys

print("zarr version", zarr.__version__)
print("numcodecs version", numcodecs.__version__)
print("python version", sys.version)

def make_array(**kwargs):
    if zarr.__version__ >= "3":
        return zarr.create_array(zarr_format=2, **kwargs)
    else:
        return zarr.create(**kwargs)

def to_str(buffer):
    if zarr.__version__ >= "3":
        return buffer.to_bytes().decode("utf-8")
    else:
        return buffer.decode("utf-8")

def get_fill_value_line(store):
    return [line.strip(" ,")
            for line in to_str(store[".zarray"]).split("\n")
            if "fill_value" in line][0]

store = {}
z = make_array(
    store=store,
    shape=(1,),
    chunks=(1,),
    dtype="f4",
    fill_value=np.nan,
)
print(get_fill_value_line(store))

When running in a zarr>=3 environment:

micromamba create -n zarr3 python 'zarr>=3'
micromamba run -n zarr3 python3 check_nan_encoding.py

the following is printed:

zarr version 3.0.1
numcodecs version 0.15.0
python version 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:19:53) [Clang 18.1.8 ]
"fill_value": NaN

Which is not a valid JSON encoding.


When running the same code in a zarr>=2,<3 environment:

micromamba env create -n zarr2 -c conda-forge python 'zarr>=2,<3'
micromamba run -n zarr2 python3 check_nan_encoding.py
zarr version 2.18.4
numcodecs version 0.15.0
python version 3.13.1 | packaged by conda-forge | (main, Jan 13 2025, 09:45:31) [Clang 18.1.8 ]
"fill_value": "NaN"

Which is fine according to the spec.

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions