Skip to content

to_zarr raises ValueError: Invalid dtype with mode='a' (but not with mode='w') #6345

Closed
@cisaacstern

Description

@cisaacstern

What happened?

A dataset in which a data variable has dtype='|S35' can be written to zarr without error as follows

import xarray as xr
import numpy as np

data = np.zeros((2, 3), dtype='|S35')
ds = xr.DataArray(data, name='foo').to_dataset()
ds.to_zarr('test.zarr', mode='w')

Changing the value of mode from 'w' to 'a', raises ValueError: Invalid dtype for data variable:

!rm -rf test.zarr
ds.to_zarr('test.zarr', mode='a')
Full Traceback
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 ds.to_zarr('test.zarr', mode='a')

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   2033 if encoding is None:
   2034     encoding = {}
-> 2036 return to_zarr(
   2037     self,
   2038     store=store,
   2039     chunk_store=chunk_store,
   2040     storage_options=storage_options,
   2041     mode=mode,
   2042     synchronizer=synchronizer,
   2043     group=group,
   2044     encoding=encoding,
   2045     compute=compute,
   2046     consolidated=consolidated,
   2047     append_dim=append_dim,
   2048     region=region,
   2049     safe_chunks=safe_chunks,
   2050 )

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1406, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   1391 zstore = backends.ZarrStore.open_group(
   1392     store=mapper,
   1393     mode=mode,
   (...)
   1402     stacklevel=4,  # for Dataset.to_zarr()
   1403 )
   1405 if mode in ["a", "r+"]:
-> 1406     _validate_datatypes_for_zarr_append(dataset)
   1407     if append_dim is not None:
   1408         existing_dims = zstore.get_dimensions()

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1301, in _validate_datatypes_for_zarr_append(dataset)
   1292         raise ValueError(
   1293             "Invalid dtype for data variable: {} "
   1294             "dtype must be a subtype of number, "
   (...)
   1297             "object".format(var)
   1298         )
   1300 for k in dataset.data_vars.values():
-> 1301     check_dtype(k)

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1292, in _validate_datatypes_for_zarr_append.<locals>.check_dtype(var)
   1283 def check_dtype(var):
   1284     if (
   1285         not np.issubdtype(var.dtype, np.number)
   1286         and not np.issubdtype(var.dtype, np.datetime64)
   (...)
   1290     ):
   1291         # and not re.match('^bytes[1-9]+$', var.dtype.name)):
-> 1292         raise ValueError(
   1293             "Invalid dtype for data variable: {} "
   1294             "dtype must be a subtype of number, "
   1295             "datetime, bool, a fixed sized string, "
   1296             "a fixed size unicode string or an "
   1297             "object".format(var)
   1298         )

ValueError: Invalid dtype for data variable: <xarray.DataArray 'foo' (dim_0: 2, dim_1: 3)>
array([[b'', b'', b''],
       [b'', b'', b'']], dtype='|S35')
Dimensions without coordinates: dim_0, dim_1 dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object

What did you expect to happen?

I would expect the behavior of mode='w' and mode='a' to be consistent as regards dtypes of data variables.

Minimal Complete Verifiable Example

See What Happened? section above

Relevant log output

See What Happened? section above

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:28:27) 
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 21.0.1
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.2
scipy: 1.8.0
netCDF4: 1.5.8
pydap: installed
h5netcdf: 999
h5py: 3.6.0
Nio: None
zarr: 2.11.0
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: 0.9.8.5
iris: None
bottleneck: None
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.02.0
cupy: None
pint: None
sparse: None
setuptools: 59.8.0
pip: 22.0.4
conda: None
pytest: 6.2.5
IPython: 8.1.1
sphinx: None

cc @rabernat

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugtopic-zarrRelated to zarr storage library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions