Skip to content

chunks management with datetime64 and timedelta64 datatype #8230

Closed
@effeminati

Description

@effeminati

What happened?

I need to perform operations with coordinates or data variables of datetime64[ns] or timedelta64[ns] data types.

Once I save the dataset or data array into Zarr format, the chunk size is arbitrarily modified by the to_zarr() function, even if I explicitly specify the encoding. It is mandatory to use the same chunk size for both disk and memory because I save each portion of the file using the region option of xarray.Dataset.to_zarr().

In addition, when I try to exploit parallelism, I encounter the error message "inconsistent chunk size".

What did you expect to happen?

I expect that the input chunk size is maintained in writing and reading from zarr

Minimal Complete Verifiable Example

import xarray as xr
import dask.array as da
import numpy as np

# define an empty dataarray
ds = xr.DataArray(da.empty(shape=(1_024,2_048), dtype='float64', chunks=512), dims=['y','x'])

# define coordinates (y is datetime64
ds = ds.assign_coords({'azimuth_time': (['y'], np.arange(1024).astype('datetime64[ns]')), 'slant_range_time': (['x'], np.arange(2048).astype('float64'))})

# define chunking
ds = ds.chunk({'x': 512, 'y': 512})

# save dataarray
ds.to_dataset(name='aaa').to_zarr('test.zarr')

# re-read dataarray
ds1 = xr.open_dataset('test.zarr', engine='zarr', chunks={})

# the chunk sizes of the coordinate azimuth time differ
print(ds1.aaa.azimuth_time.chunks, ds.azimuth_time.chunks)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

In [11]: print(ds1.aaa.azimuth_time.chunks, ds.azimuth_time.chunks)
((1024,),) ((512, 512),)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-1045-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.6.1
distributed: 2023.6.1
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.0.0
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: 1.4.1
IPython: 8.14.0
sphinx: 7.0.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions