Description
What happened?
While creating a zarr file in s3 with many datasets in it, organized in various groups and subgroups, I noticed the process was getting slower and slower.
By tracking the calls to s3 with a profiler, I noticed many calls targeting groups in the zarr that were not concerned by the current insertion.
I extracted the bulk of the logic in the reproducer below:
- a store created with zarr.storage.FSStore
- datasets being added using dataset.to_zarr(store, group=...)
I added a class of fsspec to track calls to the filesystem (on top of local storage).
If you run the reproducer, you'll see that for each insertion of a dataset at a new group, metadata and/or arrays of previous groups get listed or opened.
When used in a s3 context, this give crazy high execution times and volumes of s3 api calls.
What did you expect to happen?
list / open operation only targeting the group being written, and at the root of the zarr
Minimal Complete Verifiable Example
import xarray as xr
import zarr
import zarr.storage
import numpy as np
import fsspec
from datetime import datetime
import time
from fsspec.spec import AbstractFileSystem
from fsspec.implementations.local import LocalFileSystem
ds = xr.DataArray(
np.random.rand(1000),
dims=["x"],
coords={
"x": range(1000),
"a": 0,
"b": 1,
}, name="array").to_dataset()
class InstrumentedFS(AbstractFileSystem):
""" A wrapper to track calls to FS
"""
def __init__(
self,
fs: LocalFileSystem,
):
super().__init__()
self._fs = fs
def to_json(self):
pass
def _open(
self,
path,
mode="rb",
block_size=None,
**kwargs,
):
print(f"Opening {path}")
return self._fs._open(path, mode, block_size, **kwargs)
@property
def fsid(self):
return self._fs.fsid
def ls(self, path, detail=False, **kwargs):
print(f"Listing {path}")
return self._fs.ls(path, detail, **kwargs)
def cp_file(self, path1, path2, **kwargs):
print(f"Copying {path1} to {path2}")
self._fs.cp_file(path1, path2, **kwargs)
def _rm(self, path):
self._fs._rm(path)
def created(self, path):
print(f"called created {path}")
return self._fs.created(path)
def modified(self, path):
print(f"called modified {path}")
return self._fs.modified(path)
def sign(self, path, expiration=100, **kwargs):
return self._fs.sign(path, expiration, **kwargs)
def mkdir(self, path, create_parents=True, **kwargs):
print(f"called mkdir {path}")
return self._fs.mkdir(path, create_parents, **kwargs)
def makedirs(self, path, exist_ok=False):
print(f"called makedirs {path}")
self._fs.makedirs(path, exist_ok)
def rmdir(self, path):
self._fs.rmdir(path)
def info(self, path, **kwargs):
print(f"called info {path}")
return self._fs.info(path, **kwargs)
path=f"/tmp/test_{datetime.now().strftime("%Y%m%d%H%M")}.zarr"
print(path)
fs = fsspec.open(path).fs
ifs = InstrumentedFS(fs=fs)
store = zarr.storage.FSStore(
url=path,
mode="w",
fs=ifs,
create=True
)
for i in range(10):
print("----------------------------------")
print(f"group {i}")
ds.to_zarr(
store = store,
group = "group"+str(i),
encoding = {"x": {"chunks": (-1, -1)}},
)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Here is an extract of log output, while adding a 9th dataset.
----------------------------------
group 9
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/.zarray
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zgroup
called makedirs /tmp/test_202311281818.zarr/group9
Opening /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
Opening /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/x/.zarray
called info /tmp/test_202311281818.zarr/group9/x/.zgroup
called info /tmp/test_202311281818.zarr/group9/a/.zarray
called info /tmp/test_202311281818.zarr/group9/a/.zgroup
called info /tmp/test_202311281818.zarr/group9/b/.zarray
called info /tmp/test_202311281818.zarr/group9/b/.zgroup
called info /tmp/test_202311281818.zarr/group9/array/.zarray
called info /tmp/test_202311281818.zarr/group9/array/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zattrs
called makedirs /tmp/test_202311281818.zarr/group9
Opening /tmp/test_202311281818.zarr/group9/.zattrs
called info /tmp/test_202311281818.zarr/group9/x/.zarray
called info /tmp/test_202311281818.zarr/group9/x/.zgroup
called info /tmp/test_202311281818.zarr/.zarray
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/x/.zarray
called info /tmp/test_202311281818.zarr/group9/x/.zgroup
called info /tmp/test_202311281818.zarr/group9/x/.zarray
called makedirs /tmp/test_202311281818.zarr/group9/x
Opening /tmp/test_202311281818.zarr/group9/x/.zarray
Opening /tmp/test_202311281818.zarr/group9/x/.zarray
called info /tmp/test_202311281818.zarr/group9/x/.zattrs
called makedirs /tmp/test_202311281818.zarr/group9/x
Opening /tmp/test_202311281818.zarr/group9/x/.zattrs
Opening /tmp/test_202311281818.zarr/group9/x/0
called info /tmp/test_202311281818.zarr/group9/a/.zarray
called info /tmp/test_202311281818.zarr/group9/a/.zgroup
called info /tmp/test_202311281818.zarr/.zarray
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/a/.zarray
called info /tmp/test_202311281818.zarr/group9/a/.zgroup
called info /tmp/test_202311281818.zarr/group9/a/.zarray
called makedirs /tmp/test_202311281818.zarr/group9/a
Opening /tmp/test_202311281818.zarr/group9/a/.zarray
Opening /tmp/test_202311281818.zarr/group9/a/.zarray
called info /tmp/test_202311281818.zarr/group9/a/.zattrs
called makedirs /tmp/test_202311281818.zarr/group9/a
Opening /tmp/test_202311281818.zarr/group9/a/.zattrs
Opening /tmp/test_202311281818.zarr/group9/a/0
called info /tmp/test_202311281818.zarr/group9/a/0
called makedirs /tmp/test_202311281818.zarr/group9/a
Opening /tmp/test_202311281818.zarr/group9/a/0
called info /tmp/test_202311281818.zarr/group9/array/.zarray
called info /tmp/test_202311281818.zarr/group9/array/.zgroup
called info /tmp/test_202311281818.zarr/.zarray
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/array/.zarray
called info /tmp/test_202311281818.zarr/group9/array/.zgroup
called info /tmp/test_202311281818.zarr/group9/array/.zarray
called makedirs /tmp/test_202311281818.zarr/group9/array
Opening /tmp/test_202311281818.zarr/group9/array/.zarray
Opening /tmp/test_202311281818.zarr/group9/array/.zarray
called info /tmp/test_202311281818.zarr/group9/array/.zattrs
called makedirs /tmp/test_202311281818.zarr/group9/array
Opening /tmp/test_202311281818.zarr/group9/array/.zattrs
Opening /tmp/test_202311281818.zarr/group9/array/0
called info /tmp/test_202311281818.zarr/group9/b/.zarray
called info /tmp/test_202311281818.zarr/group9/b/.zgroup
called info /tmp/test_202311281818.zarr/.zarray
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zarray
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/.zgroup
called info /tmp/test_202311281818.zarr/group9/b/.zarray
called info /tmp/test_202311281818.zarr/group9/b/.zgroup
called info /tmp/test_202311281818.zarr/group9/b/.zarray
called makedirs /tmp/test_202311281818.zarr/group9/b
Opening /tmp/test_202311281818.zarr/group9/b/.zarray
Opening /tmp/test_202311281818.zarr/group9/b/.zarray
called info /tmp/test_202311281818.zarr/group9/b/.zattrs
called makedirs /tmp/test_202311281818.zarr/group9/b
Opening /tmp/test_202311281818.zarr/group9/b/.zattrs
Opening /tmp/test_202311281818.zarr/group9/b/0
called info /tmp/test_202311281818.zarr/group9/b/0
called makedirs /tmp/test_202311281818.zarr/group9/b
Opening /tmp/test_202311281818.zarr/group9/b/0
Listing /tmp/test_202311281818.zarr
Listing /tmp/test_202311281818.zarr/group6
Listing /tmp/test_202311281818.zarr/group6/array
Listing /tmp/test_202311281818.zarr/group6/x
Listing /tmp/test_202311281818.zarr/group6/a
Listing /tmp/test_202311281818.zarr/group6/b
Listing /tmp/test_202311281818.zarr/group7
Listing /tmp/test_202311281818.zarr/group7/array
Listing /tmp/test_202311281818.zarr/group7/x
Listing /tmp/test_202311281818.zarr/group7/a
Listing /tmp/test_202311281818.zarr/group7/b
Listing /tmp/test_202311281818.zarr/group3
Listing /tmp/test_202311281818.zarr/group3/array
Listing /tmp/test_202311281818.zarr/group3/x
Listing /tmp/test_202311281818.zarr/group3/a
Listing /tmp/test_202311281818.zarr/group3/b
Listing /tmp/test_202311281818.zarr/group8
Listing /tmp/test_202311281818.zarr/group8/array
Listing /tmp/test_202311281818.zarr/group8/x
Listing /tmp/test_202311281818.zarr/group8/a
Listing /tmp/test_202311281818.zarr/group8/b
Listing /tmp/test_202311281818.zarr/group4
Listing /tmp/test_202311281818.zarr/group4/array
Listing /tmp/test_202311281818.zarr/group4/x
Listing /tmp/test_202311281818.zarr/group4/a
Listing /tmp/test_202311281818.zarr/group4/b
Listing /tmp/test_202311281818.zarr/group5
Listing /tmp/test_202311281818.zarr/group5/array
Listing /tmp/test_202311281818.zarr/group5/x
Listing /tmp/test_202311281818.zarr/group5/a
Listing /tmp/test_202311281818.zarr/group5/b
Listing /tmp/test_202311281818.zarr/group1
Listing /tmp/test_202311281818.zarr/group1/array
Listing /tmp/test_202311281818.zarr/group1/x
Listing /tmp/test_202311281818.zarr/group1/a
Listing /tmp/test_202311281818.zarr/group1/b
Listing /tmp/test_202311281818.zarr/group9
Listing /tmp/test_202311281818.zarr/group9/array
Listing /tmp/test_202311281818.zarr/group9/x
Listing /tmp/test_202311281818.zarr/group9/a
Listing /tmp/test_202311281818.zarr/group9/b
Listing /tmp/test_202311281818.zarr/group2
Listing /tmp/test_202311281818.zarr/group2/array
Listing /tmp/test_202311281818.zarr/group2/x
Listing /tmp/test_202311281818.zarr/group2/a
Listing /tmp/test_202311281818.zarr/group2/b
Listing /tmp/test_202311281818.zarr/group0
Listing /tmp/test_202311281818.zarr/group0/array
Listing /tmp/test_202311281818.zarr/group0/x
Listing /tmp/test_202311281818.zarr/group0/a
Listing /tmp/test_202311281818.zarr/group0/b
Opening /tmp/test_202311281818.zarr/.zgroup
Opening /tmp/test_202311281818.zarr/group0/.zattrs
Opening /tmp/test_202311281818.zarr/group0/.zgroup
Opening /tmp/test_202311281818.zarr/group0/a/.zarray
Opening /tmp/test_202311281818.zarr/group0/a/.zattrs
Opening /tmp/test_202311281818.zarr/group0/array/.zarray
Opening /tmp/test_202311281818.zarr/group0/array/.zattrs
Opening /tmp/test_202311281818.zarr/group0/b/.zarray
Opening /tmp/test_202311281818.zarr/group0/b/.zattrs
Opening /tmp/test_202311281818.zarr/group0/x/.zarray
Opening /tmp/test_202311281818.zarr/group0/x/.zattrs
Opening /tmp/test_202311281818.zarr/group1/.zattrs
Opening /tmp/test_202311281818.zarr/group1/.zgroup
Opening /tmp/test_202311281818.zarr/group1/a/.zarray
Opening /tmp/test_202311281818.zarr/group1/a/.zattrs
Opening /tmp/test_202311281818.zarr/group1/array/.zarray
Opening /tmp/test_202311281818.zarr/group1/array/.zattrs
Opening /tmp/test_202311281818.zarr/group1/b/.zarray
Opening /tmp/test_202311281818.zarr/group1/b/.zattrs
Opening /tmp/test_202311281818.zarr/group1/x/.zarray
Opening /tmp/test_202311281818.zarr/group1/x/.zattrs
Opening /tmp/test_202311281818.zarr/group2/.zattrs
Opening /tmp/test_202311281818.zarr/group2/.zgroup
Opening /tmp/test_202311281818.zarr/group2/a/.zarray
Opening /tmp/test_202311281818.zarr/group2/a/.zattrs
Opening /tmp/test_202311281818.zarr/group2/array/.zarray
Opening /tmp/test_202311281818.zarr/group2/array/.zattrs
Opening /tmp/test_202311281818.zarr/group2/b/.zarray
Opening /tmp/test_202311281818.zarr/group2/b/.zattrs
Opening /tmp/test_202311281818.zarr/group2/x/.zarray
Opening /tmp/test_202311281818.zarr/group2/x/.zattrs
Opening /tmp/test_202311281818.zarr/group3/.zattrs
Opening /tmp/test_202311281818.zarr/group3/.zgroup
Opening /tmp/test_202311281818.zarr/group3/a/.zarray
Opening /tmp/test_202311281818.zarr/group3/a/.zattrs
Opening /tmp/test_202311281818.zarr/group3/array/.zarray
Opening /tmp/test_202311281818.zarr/group3/array/.zattrs
Opening /tmp/test_202311281818.zarr/group3/b/.zarray
Opening /tmp/test_202311281818.zarr/group3/b/.zattrs
Opening /tmp/test_202311281818.zarr/group3/x/.zarray
Opening /tmp/test_202311281818.zarr/group3/x/.zattrs
Opening /tmp/test_202311281818.zarr/group4/.zattrs
Opening /tmp/test_202311281818.zarr/group4/.zgroup
Opening /tmp/test_202311281818.zarr/group4/a/.zarray
Opening /tmp/test_202311281818.zarr/group4/a/.zattrs
Opening /tmp/test_202311281818.zarr/group4/array/.zarray
Opening /tmp/test_202311281818.zarr/group4/array/.zattrs
Opening /tmp/test_202311281818.zarr/group4/b/.zarray
Opening /tmp/test_202311281818.zarr/group4/b/.zattrs
Opening /tmp/test_202311281818.zarr/group4/x/.zarray
Opening /tmp/test_202311281818.zarr/group4/x/.zattrs
Opening /tmp/test_202311281818.zarr/group5/.zattrs
Opening /tmp/test_202311281818.zarr/group5/.zgroup
Opening /tmp/test_202311281818.zarr/group5/a/.zarray
Opening /tmp/test_202311281818.zarr/group5/a/.zattrs
Opening /tmp/test_202311281818.zarr/group5/array/.zarray
Opening /tmp/test_202311281818.zarr/group5/array/.zattrs
Opening /tmp/test_202311281818.zarr/group5/b/.zarray
Opening /tmp/test_202311281818.zarr/group5/b/.zattrs
Opening /tmp/test_202311281818.zarr/group5/x/.zarray
Opening /tmp/test_202311281818.zarr/group5/x/.zattrs
Opening /tmp/test_202311281818.zarr/group6/.zattrs
Opening /tmp/test_202311281818.zarr/group6/.zgroup
Opening /tmp/test_202311281818.zarr/group6/a/.zarray
Opening /tmp/test_202311281818.zarr/group6/a/.zattrs
Opening /tmp/test_202311281818.zarr/group6/array/.zarray
Opening /tmp/test_202311281818.zarr/group6/array/.zattrs
Opening /tmp/test_202311281818.zarr/group6/b/.zarray
Opening /tmp/test_202311281818.zarr/group6/b/.zattrs
Opening /tmp/test_202311281818.zarr/group6/x/.zarray
Opening /tmp/test_202311281818.zarr/group6/x/.zattrs
Opening /tmp/test_202311281818.zarr/group7/.zattrs
Opening /tmp/test_202311281818.zarr/group7/.zgroup
Opening /tmp/test_202311281818.zarr/group7/a/.zarray
Opening /tmp/test_202311281818.zarr/group7/a/.zattrs
Opening /tmp/test_202311281818.zarr/group7/array/.zarray
Opening /tmp/test_202311281818.zarr/group7/array/.zattrs
Opening /tmp/test_202311281818.zarr/group7/b/.zarray
Opening /tmp/test_202311281818.zarr/group7/b/.zattrs
Opening /tmp/test_202311281818.zarr/group7/x/.zarray
Opening /tmp/test_202311281818.zarr/group7/x/.zattrs
Opening /tmp/test_202311281818.zarr/group8/.zattrs
Opening /tmp/test_202311281818.zarr/group8/.zgroup
Opening /tmp/test_202311281818.zarr/group8/a/.zarray
Opening /tmp/test_202311281818.zarr/group8/a/.zattrs
Opening /tmp/test_202311281818.zarr/group8/array/.zarray
Opening /tmp/test_202311281818.zarr/group8/array/.zattrs
Opening /tmp/test_202311281818.zarr/group8/b/.zarray
Opening /tmp/test_202311281818.zarr/group8/b/.zattrs
Opening /tmp/test_202311281818.zarr/group8/x/.zarray
Opening /tmp/test_202311281818.zarr/group8/x/.zattrs
Opening /tmp/test_202311281818.zarr/group9/.zattrs
Opening /tmp/test_202311281818.zarr/group9/.zgroup
Opening /tmp/test_202311281818.zarr/group9/a/.zarray
Opening /tmp/test_202311281818.zarr/group9/a/.zattrs
Opening /tmp/test_202311281818.zarr/group9/array/.zarray
Opening /tmp/test_202311281818.zarr/group9/array/.zattrs
Opening /tmp/test_202311281818.zarr/group9/b/.zarray
Opening /tmp/test_202311281818.zarr/group9/b/.zattrs
Opening /tmp/test_202311281818.zarr/group9/x/.zarray
Opening /tmp/test_202311281818.zarr/group9/x/.zattrs
called info /tmp/test_202311281818.zarr/.zmetadata
called makedirs /tmp/test_202311281818.zarr
Opening /tmp/test_202311281818.zarr/.zmetadata
Opening /tmp/test_202311281818.zarr/.zmetadata
Anything else we need to know?
No response
Environment
xarray: 2023.11.0
pandas: 2.1.3
numpy: 1.26.2
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.18.1
sphinx: None