Skip to content

Converting a NetCDF to Zarr and back changes the dtype of some variable attributes #10361

Open
@marcronq

Description

@marcronq

What happened?

If there is a NetCDF that has a variable with one attribute typed as a list of shorts (notice de s in the valid_range attribute in the ncdump output):

$ ncdump -h synthetic_sample.nc
 
netcdf synthetic_sample {
dimensions:
	loc = 2 ;
	time = 4 ;
variables:
	double temperature(loc, time) ;
		temperature:_FillValue = 99s ;
		temperature:valid_range = 0s, 70s ;
		temperature:coordinates = "lat lon reference_time" ;
        ...

And if I convert this Dataset to a Zarr store and back to a NetCDF and check again the output of ncdump:

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset("synthetic_sample.nc")

In [3]: ds.to_zarr("nc_to_zarr.zarr")

Out[3]: <xarray.backends.zarr.ZarrStore at 0x10b0d7e20>

In [4]: dsz = xr.open_dataset("nc_to_zarr.zarr", engine="zarr")

In [5]: dsz.to_netcdf("zarr_to_nc.nc")
$ ncdump -h zarr_to_nc.nc
 
netcdf zarr_to_nc {
dimensions:
	loc = 2 ;
	time = 4 ;
variables:
	double temperature(loc, time) ;
		temperature:_FillValue = 99s ;
		temperature:valid_range = 0LL, 70LL ;
		temperature:coordinates = "lat lon reference_time" ;
        ...

See the LL in the valid_range attribute, it was converted into a Long type. _FillValue, however was kept as a short. Is there a work around this issue?

Thanks

What did you expect to happen?

I would expect all attributes to be kept with the original type, like

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset("synthetic_sample.nc")

In [3]: ds.to_zarr("nc_to_zarr.zarr")

Out[3]: <xarray.backends.zarr.ZarrStore at 0x10b0d7e20>

In [4]: dsz = xr.open_dataset("nc_to_zarr.zarr", engine="zarr")

In [5]: dsz.to_netcdf("zarr_to_nc.nc")
$ ncdump -h zarr_to_nc.nc
 
netcdf zarr_to_nc {
dimensions:
	loc = 2 ;
	time = 4 ;
variables:
	double temperature(loc, time) ;
		temperature:_FillValue = 99s ;
		temperature:valid_range = 0s, 70s ;
		temperature:coordinates = "lat lon reference_time" ;
        ...

Minimal Complete Verifiable Example

import pandas as pd
import xarray as xr
import numpy as np

# Build a sample dataset
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 4)
precipitation = 10 * np.random.rand(2, 4)
lon = [-99.83, -99.32]
lat = [42.25, 42.21]
time = pd.date_range("2014-09-06", periods=4)
reference_time = pd.Timestamp("2014-09-05")

ds = xr.Dataset(
data_vars=dict(
    temperature=(["loc", "time"], temperature),
    precipitation=(["loc", "time"], precipitation),
),
coords=dict(
    lon=("loc", lon),
    lat=("loc", lat),
    time=time,
    reference_time=reference_time,
),
attrs=dict(description="Weather related data."),
)

ds.temperature.attrs['valid_range'] = np.array([0, 70]).astype(np.int16)
ds.temperature.encoding['dtype'] = 'int16'
ds.temperature.encoding['_FillValue'] = np.int16(99)

# Test
ds.to_netcdf("synthetic_sample.nc") # temperature:valid_range is short
ds = xr.open_dataset("synthetic_sample.nc")
ds.to_zarr("nc_to_zarr.zarr")
dsz = xr.open_dataset("nc_to_zarr.zarr", engine="zarr")
dsz.to_netcdf("zarr_to_nc.nc") # temperature:valid_range is long

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.12 (main, May 26 2025, 18:00:43) [Clang 17.0.0 (clang-1700.0.13.3)]
python-bits: 64
OS: Darwin
OS-release: 24.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8
LOCALE: ('es_ES', 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2

xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.2.6
scipy: 1.15.3
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.6.1
h5py: 3.13.0
zarr: 3.0.8
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.5.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 9.2.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions