Skip to content

Appending to Zarr store on disk changes dimension metadata #11101

@jacobbieker

Description

@jacobbieker

What happened?

I am writing lots of geo data to disk in Zarr and Icechunk, usually through appending to a given Zarr store on disk. I noticed that recently, some of the data values have seemed flipped compared to what the dimension says they should be. I've made a minimal example to show this, the datasets appended in memory are correct, but the one appended to on disk has the level dimension updated to match the last appended timestep, resulting in previous timestep values to not match the dimensions.

What did you expect to happen?

I would expect when appending to a Zarr store on disk, Xarray would align the new data to append to match the current Zarr store's metadata. Or at least throw an error if they don't match, rather than silently updating the store's metadata.

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

import xarray as xr
xr.show_versions()
import xarray as xr
import numpy as np

var1 = xr.DataArray(
    coords={"time": [0,], "latitude": [0,1], "longitude": [0,1],
            "level": np.arange(0, 5, 1)},
    data=np.arange(0,20,1).reshape((1,2,2,5))
).to_dataset(name="var").chunk({"time": 1, "latitude": -1, "longitude": -1, "level": -1})

var1.to_zarr("test.zarr", compute=True, mode="w")
var1.to_zarr("test2.zarr", compute=True, mode="w")

var2 = xr.DataArray(
    coords={"time": [1,], "latitude": [0,1], "longitude": [0,1],
            "level": np.arange(4, -1, -1)},
    data=np.arange(19,-1,-1).reshape((1,2,2,5))
).to_dataset(name="var").chunk({"time": 1, "latitude": -1, "longitude": -1, "level": -1})
var2.to_zarr("test.zarr", append_dim="time", compute=True, mode="a")
ds = xr.concat([var1, var2], dim="time")
zarr_ds = xr.open_zarr("test.zarr")
zarr_var1_ds = xr.open_zarr("test2.zarr")
print(ds)
print(zarr_ds)
print(zarr_var1_ds)
zarr_ds_var1 = zarr_ds.isel(time=[0,])
assert np.allclose(zarr_ds_var1["var"].values, zarr_var1_ds["var"].values)
xr.testing.assert_equal(ds, zarr_ds)

Steps to reproduce

Run the above script.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

/home/jacob/assimilation/.pixi/envs/gpu/lib/python3.12/site-packages/zarr/api/asynchronous.py:247: ZarrUserWarning:

Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.

/home/jacob/assimilation/.pixi/envs/gpu/lib/python3.12/site-packages/zarr/api/asynchronous.py:247: ZarrUserWarning:

Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.

/home/jacob/assimilation/check_zarr.py:22: FutureWarning:

In a future version of xarray the default value for join will change from join='outer' to join='exact'. This change will result in the following ValueError: cannot be aligned with join='exact' because index/labels/sizes are not equal along these coordinates (dimensions): 'level' ('level',) The recommendation is to set join explicitly for this case.

<xarray.Dataset> Size: 408B
Dimensions:    (time: 2, latitude: 2, longitude: 2, level: 5)
Coordinates:
  * time       (time) int64 16B 0 1
  * latitude   (latitude) int64 16B 0 1
  * longitude  (longitude) int64 16B 0 1
  * level      (level) int64 40B 0 1 2 3 4
Data variables:
    var        (time, latitude, longitude, level) int64 320B dask.array<chunksize=(1, 2, 2, 5), meta=np.ndarray>
<xarray.Dataset> Size: 408B
Dimensions:    (time: 2, latitude: 2, longitude: 2, level: 5)
Coordinates:
  * time       (time) int64 16B 0 1
  * latitude   (latitude) int64 16B 0 1
  * longitude  (longitude) int64 16B 0 1
  * level      (level) int64 40B 4 3 2 1 0
Data variables:
    var        (time, latitude, longitude, level) int64 320B dask.array<chunksize=(1, 2, 2, 5), meta=np.ndarray>
<xarray.Dataset> Size: 240B
Dimensions:    (time: 1, latitude: 2, longitude: 2, level: 5)
Coordinates:
  * time       (time) int64 8B 0
  * latitude   (latitude) int64 16B 0 1
  * longitude  (longitude) int64 16B 0 1
  * level      (level) int64 40B 0 1 2 3 4
Data variables:
    var        (time, latitude, longitude, level) int64 160B dask.array<chunksize=(1, 2, 2, 5), meta=np.ndarray>
Traceback (most recent call last):
  File "/home/jacob/assimilation/check_zarr.py", line 30, in <module>
    xr.testing.assert_equal(ds, zarr_ds)
  File "/home/jacob/assimilation/.pixi/envs/gpu/lib/python3.12/site-packages/xarray/testing/assertions.py", line 33, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jacob/assimilation/.pixi/envs/gpu/lib/python3.12/site-packages/xarray/testing/assertions.py", line 147, in assert_equal
    assert a.equals(b), formatting.diff_dataset_repr(a, b, "equals")
           ^^^^^^^^^^^
AssertionError: Left and right Dataset objects are not equal
Differing coordinates:
L * level      (level) int64 40B 0 1 2 3 4
R * level      (level) int64 40B 4 3 2 1 0
Differing data variables:
L   var        (time, latitude, longitude, level) int64 320B dask.array<chunksize=(1, 2, 2, 5), meta=np.ndarray>
R   var        (time, latitude, longitude, level) int64 320B dask.array<chunksize=(1, 2, 2, 5), meta=np.ndarray>

Anything else we need to know?

No response

Environment

Details INSTALLED VERSIONS ------------------ commit: None python: 3.12.12 | packaged by conda-forge | (main, Oct 22 2025, 23:25:55) [GCC 14.3.0] python-bits: 64 OS: Linux OS-release: 6.14.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('C', 'UTF-8') libhdf5: 1.14.6 libnetcdf: 4.9.3 xarray: 2025.12.0 pandas: 2.3.3 numpy: 2.4.1 scipy: 1.17.0 netCDF4: 1.7.4 pydap: None h5netcdf: 1.7.3 h5py: 3.15.1 zarr: 3.1.5 cftime: 1.6.5 nc_time_axis: None iris: None bottleneck: 1.6.0 dask: 2025.12.0 distributed: 2025.12.0 matplotlib: 3.10.8 cartopy: 0.25.0 seaborn: 0.13.2 numbagg: None fsspec: 2025.12.0 cupy: None pint: 0.25.2 sparse: None flox: 0.10.8 numpy_groupies: 0.11.3 setuptools: 80.9.0 pip: 25.3 conda: None pytest: 8.4.2 mypy: None IPython: 9.9.0 sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions