Description
Although the V3 Group
does not inherit from Mapping
or MutableMapping
, it has methods that are mapping-like, e.g. __contains__
, __getitem__
, __iter__
, __delitem__
etc. I think this makes sense in terms of continuity with zarr v2, where Group
explicitly implemented MutableMapping
, but I'm not sure we ultimately want to do this.
Here I will argue against making Group
to be dict-like.
First, an argument from fairness: conceptually, a Group
can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with the MutableMapping
API. So a Group
is two mutable mappings, not one. But if Group
is itself a MutableMapping
, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target for MutableMapping
methods on Group
. Thus I think using MutableMapping
for the members of a Group, while delegating the same API for attributes under a .attrs
is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treat Group.members
and Group.attributes
each as their own MutableMapping
, and don't pretend that Group
is a dict, because it's not -- if anything, it's two dicts.
Second, an argument from ergonomics. We want to support something like x = Group["subarray"]
, or x = Group.members["subarray"]
. This expands to x = (access the metadata for "subarray", then return an in-memory Array instance from it)
. Creating that in-memory Array
instance will almost certainly take runtime parameters, e.g. write_empty_chunks
, caching control, etc. But __getitem__
is hard to parametrize. If we rely on Group["subarray"]
, we are forced to choose from a few unattractive options:
- we can define those
Array
parameters in a global configuration object, or as properties of theGroup
instance - we can ask users to modify the
Array
instance after assigning it. - we can remove those runtime parameters from the
Array
constructor and push them down to the array methods (which honestly I kind of like)
But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.
So, my conclusion is that we should not implement Group.__getitem__
, or any other Mapping
-like method on Group
. There might be a good argument for implementing the MutableMapping
API for Group.attrs
and Group.members
, but maybe not. Curious to hear whether anyone has strong objections to this direction.
Metadata
Metadata
Assignees
Type
Projects
Status