Description
The V3 spec introduces the concept of an "implicit group", i.e. a Zarr group that has no metadata, and is defined instead by its descendants in the storage hierarchy. Work on the V3 branch has led me to believe that supporting implicit groups will be a significant headache for zarr-python
, because implicit groups make the state of the storage backend needlessly ambiguous -- it is not possible to distinguish a random prefix from an actual implicit group without expensive IO operations (directory listing), which not all stores support. For visibility, I will repost the contents of #1743 (comment) here:
Because of the intersection of the store design and implicit groups, we are a bit squeezed when it comes to performance / ergonomics for discovery routines like
Group.open()
.Consider this code:
# create a group g = Group.open("group_path") g.open("X")there are three possibilities to handle:
* `group_path/X/zarr.json` exists. Then we attempt to parse it, and produce a group or array. * `group_path/X` is the key of an object in the store. Then `group_path/X/zarr.json` cannot exist, and so we should raise a `KeyError` * The key `group_path/X/zarr.json` does not exist. This has two sub-cases. * * `group_path/X/` is a prefix for the key of some Zarr metadata object, like `group_path/X/Y/zarr.json`. Then `group_path/X` should be treated as the key for an implicit group according to the spec. * * `group_path/X/` is not a prefix for any key in in the store, (e.g., because `X` is just a random string a user typed in by accident). Numerically, this is the most abundant situation (most stores only contain a small subset of all possible keys). The only way we can test for this is by calling `store.list_dir` and checking whether `X` appears in the list of prefixes returned by `list_dir`. I don't think calling `list_dir` for every `Group.open()` call is a good idea, but that's our only option with the current v3 spec and the current store design.
Please someone correct me if I'm wrong here, but it looks like if we support implicit groups, and we don't want
Group.open(blabalabla)
to nearly always succeed, then we have to accept terrible performance (due to the need to run a directory listing). This entails that stores that don't support directory listing will appear to contain an infinite number of implicit groups. Not great. We should really remove implicit groups from the spec.
One solution is to simply not support implicit groups. Does anyone object if zarr-python
does not support implicit groups? This will make a lot of things easier for the implementation. Obviously it's bad if zarr-python
isn't a complete implementation of the v3 spec, but supporting a feature that is (in my opinion) broken and confusing might be worse.
A parallel solution is to remove implicit groups from the spec itself. Discussion of that topic is here: zarr-developers/zarr-specs#291
Metadata
Metadata
Assignees
Labels
Type
Projects
Status