`DataTree.to_zarr()` is very slow writing to high latency store

### What is your issue?

Repost of https://github.com/xarray-contrib/datatree/issues/277, with some updates.

## Test case

Write a tree containing 13 nodes and negligible data to S3/GCS with fsspec:

```python
import numpy as np
import xarray as xr

ds = xr.Dataset(
    data_vars={
        "a": xr.DataArray(np.ones((2, 2)), coords={"x": [1, 2], "y": [1, 2]}),
        "b": xr.DataArray(np.ones((2, 2)), coords={"x": [1, 2], "y": [1, 2]}),
        "c": xr.DataArray(np.ones((2, 2)), coords={"x": [1, 2], "y": [1, 2]}),
    }
)

dt = xr.core.datatree.DataTree()
for first_level in [1, 2, 3]:
    dt[f"{first_level}"] = DataTree(ds)
    for second_level in [1, 2, 3]:
        dt[f"{first_level}/{second_level}"] = DataTree(ds)

%time dt.to_zarr("test.zarr", mode="w")

bucket = "s3|gs://your-bucket/path" 
%time dt.to_zarr(f"{bucket}/test.zarr", mode="w")
```

Gives:
```
CPU times: user 287 ms, sys: 43.9 ms, total: 331 ms
Wall time: 331 ms
CPU times: user 3.22 s, sys: 219 ms, total: 3.44 s
Wall time: 1min 4s
```

This is a bit better than in the original issue due to improvements elsewhere in the stack, but still really slow for heavily nested but otherwise small datasets.

## Potential Improvements

#9014 did make some decent improvements to read speed. When reading the dataset written above I get:

```python
%timeit xr.backends.api.open_datatree(f"{bucket}/test.zarr", engine="zarr")
%timeit datatree.open_datatree(f"{bucket}/test.zarr", engine="zarr")
```

```
882 ms ± 47.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.47 s ± 86.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

We'll need similar optimizations on the write side. The fundamental issue is that `DataTree.to_zarr` relies on serial `Dataset.to_zarr` calls for each node:

https://github.com/pydata/xarray/blob/12c690f4bd72141798d7c3991a95abf88b5d76d3/xarray/core/datatree_io.py#L153-L171

This results in many `fsspec` calls to list dirs, check file existence, and put small metadata and attribute files in the bucket. Here's `snakeviz` on the example:

![image](https://github.com/user-attachments/assets/6810fab6-10cc-4f1c-b096-5143a77cc788)

(The 8s block on the right is metadata consolidation)

## Workaround

If your data is small enough to dump locally, this works great:

```python
def to_zarr(dt, path):
    with TemporaryDirectory() as tmp_path:
        dt.to_zarr(tmp_path)
        fs.put(tmp_path, path, recursive=True)
```
Takes about 1s.


	for node in dt.subtree:
	ds = node.to_dataset(inherited=False)
	group_path = node.path
	if ds is None:
	_create_empty_zarr_group(store, group_path, mode)
	else:
	ds.to_zarr(
	store,
	group=group_path,
	mode=mode,
	encoding=encoding.get(node.path),
	consolidated=False,
	**kwargs,
	)
	if "w" in mode:
	mode = "a"

	if consolidated:
	consolidate_metadata(store)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`DataTree.to_zarr()` is very slow writing to high latency store #9455

What is your issue?

Test case

Potential Improvements

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataTree.to_zarr() is very slow writing to high latency store #9455

Description

What is your issue?

Test case

Potential Improvements

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`DataTree.to_zarr()` is very slow writing to high latency store #9455