-
-
Notifications
You must be signed in to change notification settings - Fork 356
ZEP9: Parse Metadata Objects #2866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
what are the current use cases for this? |
There are a number of extensions appearing now in zarr-extensions. Potentially, there will also be modifications to existing extensions such as added attributes in the metadata. zarr-python should fail if it encounters unknown extensions or attributes, unless marked with |
as I understand it, all of the extensions proposed in zarr-extensions have static JSON representations. Is this not true? And this PR seems to allow the possibility that any one of those JSON representations might gain new fields, which should be ignored iff those fields are JSON objects containing the |
There may be fields that are not strictly necessary for reading data that could be marked as optional. An example might be a "chunk_layout" in the sharding codec to denote how the chunks are ordered in the shard, e.g. morton, c, random etc.. While useful when writing, it is not necessary for reading because all chunk offsets are stored in the index. Additionally, there may be new optional fields that are added to the root of the array or group metadata through a ZEP. |
these examples are not yet in use, which is why I asked what the current use cases are. This PR makes changes to how metadata is parsed (e.g., checking the contents of gzip json metadata) that, as far as I can tell, have no use. Currently, I think people can expect that zarr-python can round-trip zarr data. To me, that means that if zarr-python can read zarr data from one place, it should be able to create a structurally identical copy of that zarr data somewhere else. The concept in this PR -- that we would support extra metadata fields in any metadata object which can be ignored when reading -- violates this expectation. So I think we need to have a larger conversation about what these hypothetical optional metadata fields mean for zarr-python before we add support for them. Until there are real examples out there of metadata with these |
Fair enough. Then we should scope this PR to
|
This PR implements the following aspects of ZEP9 (Phase 1)
this will help to enable zarr extensions
TODOs in code:
TODO:
docs/user-guide/*.rst
changes/