Skip to content

[V3] Should Group be dict-like? #1787

Closed
@d-v-b

Description

@d-v-b

Although the V3 Group does not inherit from Mapping or MutableMapping, it has methods that are mapping-like, e.g. __contains__, __getitem__, __iter__, __delitem__ etc. I think this makes sense in terms of continuity with zarr v2, where Group explicitly implemented MutableMapping, but I'm not sure we ultimately want to do this.

Here I will argue against making Group to be dict-like.

First, an argument from fairness: conceptually, a Group can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with the MutableMapping API. So a Group is two mutable mappings, not one. But if Group is itself a MutableMapping, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target for MutableMapping methods on Group. Thus I think using MutableMapping for the members of a Group, while delegating the same API for attributes under a .attrs is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treat Group.members and Group.attributes each as their own MutableMapping, and don't pretend that Group is a dict, because it's not -- if anything, it's two dicts.

Second, an argument from ergonomics. We want to support something like x = Group["subarray"], or x = Group.members["subarray"]. This expands to x = (access the metadata for "subarray", then return an in-memory Array instance from it). Creating that in-memory Array instance will almost certainly take runtime parameters, e.g. write_empty_chunks, caching control, etc. But __getitem__ is hard to parametrize. If we rely on Group["subarray"], we are forced to choose from a few unattractive options:

  • we can define those Array parameters in a global configuration object, or as properties of the Group instance
  • we can ask users to modify the Array instance after assigning it.
  • we can remove those runtime parameters from the Array constructor and push them down to the array methods (which honestly I kind of like)
    But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.

So, my conclusion is that we should not implement Group.__getitem__, or any other Mapping-like method on Group. There might be a good argument for implementing the MutableMapping API for Group.attrs and Group.members, but maybe not. Curious to hear whether anyone has strong objections to this direction.

cc @jhamman @normanrz

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions