Open
Description
What happened?
I found out that when dealing with nested DataTree
and storing/saving in zarr (dt.to_zarr()
), it creates an empty Dataset
at the root level. I was trying to append an isomorphic datatree to an existing datatree stored in Zarr. It fails when passing the mode
and dim
parameters since the Dataset
at the root level has no dimension or coordinates.
What did you expect to happen?
I wondered why we need an empty Dataset
at the root level without data or coordinates. However, I am not sure if this is the best way to merge/concat/append two DataTrees
.
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
import pandas as pd
ds = xr.tutorial.open_dataset("air_temperature").drop_attrs()
# Split dataset in two
ds_first_half = ds.isel(time=range(0, int(len(ds.time)/2)))
ds_second_half = ds.isel(time=range(int(len(ds.time)/2), int(len(ds.time))))
# creating a pressure datarray
pressure_data = 1000.0 + 5 * np.random.randn(100, 3, 4)
lons = np.linspace(-120, -90, 4)
lats = np.linspace(25, 55, 3)
times = pd.date_range("2018-01-01", periods=100)
pressure = xr.DataArray(
pressure_data,
coords=[times, lats, lons],
dims=["time", "lat", "lon"]
).to_dataset(name="pressure")
# splitting pressure in two
press_fh = pressure.isel(time=range(0, int(len(pressure.time)/2)))
press_sh = pressure.isel(time=range(int(len(pressure.time)/2), int(len(pressure.time))))
# first half dtree
dt_fh = xr.DataTree.from_dict(
{
"/temp": ds_first_half,
"/pressure": press_fh,
}
)
store = "dtree.zarr"
dt_fh.to_zarr(
store,
consolidated=True,
)
# second half dtree
dt_sh = xr.DataTree.from_dict(
{
"/temp": ds_second_half,
"/pressure": press_sh,
}
)
dt_sh.to_zarr(
store,
mode="a-",
consolidated=True,
append_dim="time",
)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
this is the error triggered by the empty `Dataset` at the root level
Traceback (most recent call last):
File "/snap/pycharm-community/425/plugins/python-ce/helpers/pydev/pydevd.py", line 1570, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/snap/pycharm-community/425/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/media/alfonso/drive/Alfonso/python/raw2zarr/issue-delete.py", line 65, in <module>
dt_fh.to_zarr(
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/core/datatree.py", line 1699, in to_zarr
_datatree_to_zarr(
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/core/datatree_io.py", line 123, in _datatree_to_zarr
ds.to_zarr(
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/core/dataset.py", line 2595, in to_zarr
return to_zarr( # type: ignore[call-overload,misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/backends/api.py", line 2184, in to_zarr
dump_to_store(dataset, zstore, writer, encoding=encoding)
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/backends/api.py", line 1920, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "/home/alfonso/mambaforge/envs/raw2zarr/lib/python3.12/site-packages/xarray/backends/zarr.py", line 889, in store
raise ValueError(
ValueError: append_dim='time' does not match any existing dataset dimensions {}
Environment
python: 3.12 \n
xarray version: '2024.10.1.dev59+g700191b9'