Skip to content

[v3] Refactoring our serialization handling #2144

Open
@TomAugspurger

Description

@TomAugspurger

Zarr version

v3

Numcodecs version

na

Python Version

na

Operating System

na

Installation

na

Description

While working on consolidated metadata, I bumped into some awkwardness of our serialization logic. It ended up being fine, but I wonder if we could do a bit better by implementing some dedicated logic to serialization and deserialization of the Zarr metadata objects.

For now, I'll assume that we aren't interested in using a 3rd-party library like msgspec, pydantic, or cattrs for this. My current implementation is at https://github.com/TomAugspurger/zarr-python/blob/feature/serde/src/zarr/_serialization.py and it's pretty complicated, but I'll be able to clean some things up (both in that file, and in the various to_dict / from_dict methods on our objects currently). I personally have a pretty high bar for libraries taking on dependencies, so I think this is worth it, but others may disagree.

I wrote up https://tomaugspurger.net/posts/serializing-dataclasses/ with a little introduction based on what I've learned so far. The short version is

  • Use a similar system for serialization (json.dumps(obj, default=custom_converter_function)), but centralize all the serialization logic in one spot
  • Use type hints (at runtime) to know what object to deserialize into. This gets complicated, but not too bad.

I'm not sure yet, but this might need to have some effect on the __init__ / __post_init__ of our dataclasses. In general, I'd like to move some of the parse_<thing> that are found in a lot of our dataclasses to the boundary of the program, and have the init of the dataclasses just be the default generated by @dataclass. This will mean duplicating some logic on the external facing methods like zarr.open, zarr.create, Group.create, but IMO that's worth it.

I still have some work to do (parsing nested objects like lists and tuples properly), but wanted to share https://tomaugspurger.net/posts/serializing-dataclasses/ sooner rather than later.

Steps to reproduce

n/a

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions