-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3 spec: Consider removing metadata encoding #174
Comments
Assuming we keep metadata_encoding at all, I think that metadata_key_suffix can be skipped altogether, rather than embedded as a member of metadata_encoding. The specification for each metadata encoding type would simply state what the key suffix is. An implementation that does not know about a given metadata_encoding does not benefit by knowing the extension since it still can't decode it. As for #37, the original request was for being able to encode metadata as hdf5 attributes which have a different data model. I'm not clear exactly how that request fits in with zarr v3, but I don't think metadata_encoding is actually helpful for that purpose. As for #141, I also don't think metadata_encoding is a great solution if regular json with its limitations remains the default; in fact I think if the representable values vary depending on the metadata_encoding that would be confusing and problematic. |
What if we made metadata encoding up to the store? For, some stores (filesystems, object stores), json is a natural choice. For other stores (e.g. document databases), native storage of dictionaries would be more natural. |
How would calling code know which store to use? |
I think you might have intended to ask: "How would the calling code would which encoding to use?" Conceptually we could say that the store just decodes and re-encoders when writing, and does the reverse when reading, based on the key requested. Implementations of zarr could choose to make this more efficient by adding to their store abstraction an interface for reading and writing in-memory JSON values directly, in order to avoid the extra encoding and decoding. That would also be necessary if the value cannot be represented as JSON (e.g. nan, infinity, specific nan values), though we might want to avoid such values precisely to ensure consistency across metadata encodings. |
Right, so the interface might look something like class Store:
def store_meta(key: str, metadata: dict[str, ?]) -> None:
...
def store_bytes(key: str, data: bytes) -> None:
... i.e. you would path a native dict to the store. |
... Maybe. Guess as always concrete examples would help. For a database, I agree it's not really our business to say how it's going in. I was more concerned about the middle ground where someone is trying to achieve |
We removed the metadata encoding for now since a similar behavior can be specified via group extensions or group storage transformers. If the data is still stored json-like, it might be handled at the level of the store, as indicated here:
I'm closing this issue for now, I'd propose to move to #37 to discuss other encodings further. |
In #171 I consolidated
metadata_key_suffix
andmetadata_encoding
into a single object, which can also be an extension point. @jbms raised the question if this explicit extension point is needed, since global extensions could define the same behavior (see conversation here. I'll try to summarize the pro and contra for removing it:Pro:
Contra:
metadata_encoding
extension point explicitly makes implementors more aware of this being a possible extension, and might make adapting further extensions more easy.The text was updated successfully, but these errors were encountered: