Skip to content

More efficient Group.arrays() for v3 stores #1721

Closed
@dcherian

Description

@dcherian

Zarr version

v2.16.1

Numcodecs version

n/a

Python Version

3.11

Operating System

mac

Installation

conda

Description

Xarray uses .arrays() to iterate over the arrays in a group.
https://github.com/pydata/xarray/blob/4a0bb2eb80538806468233d11bc5a4c06ffb417e/xarray/backends/zarr.py#L539

The implementation is a serial for loop that requests .array.json and constructs the Zarr array to return:

for key in sorted(listdir(self._store, dir_name)):
if key.endswith(array_sfx):
key = key[: -len(array_sfx)]
_key = key.rstrip("/")
yield _key if keys_only else (_key, self[key])

It be nice to be more efficient here when opening a store with O(100) variables on cloud object storage.

Here's one idea that comes to mind. This code already knows the json files it needs (the listdir call). That means Zarr could request all the json docs at once using store.getitems, and use those to construct the array objects.

I don't immediately see how to enable this though. Perhaps there are other solutions.

Steps to reproduce

n/a

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew features or improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions