`xr.open_zarr` is 3x slower than `zarr.open`, even at scale

### What is your issue?

I'm doing some benchmarks on Xarray + Zarr vs. some other formats, and I get quite a surprising result — in a very simple array, xarray is adding a lot of overhead to reading a Zarr array.

Here's a quick script — super simple, just a single chunk. It's 800MB of data — so not some tiny array where reading a metadata json file or allocating an index is going to throw the results.

```python
import numpy as np
import zarr
import xarray as xr
import dask
print(zarr.__version__, xr.__version__, dask.__version__)

(
    xr.DataArray(np.random.rand(10000, 10000), name="foo")
    .to_dataset()
    .chunk(None)
    .to_zarr("test.zarr", mode="w")
)

%timeit xr.open_zarr("test.zarr").compute()
%timeit zarr.open("test.zarr")["foo"][:]

```

```
2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
551 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
183 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

So:
- 551ms for xarray
- 183ms for zarr

Having a quick look with `py-spy` suggests there might be some thread contention, but not sure how much is really contention vs. idle threads waiting.

---

Making the array 10x bigger (with 10 chunks) reduces the relative difference, but it's still fairly large:

```
2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
6.88 s ± 353 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.15 s ± 264 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

---

Any thoughts on what might be happening? Is the benchmark at least correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

xr.open_zarr is 3x slower than zarr.open, even at scale #9111

Description

What is your issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111