-
Notifications
You must be signed in to change notification settings - Fork 47
Add a function to produce a ManifestStore from HDF5 files #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI I'll return to this PR later to fix the failing test - the issue is that we now have support for multiple chunk_key_encodings but no way yet to specify which is used in the Zarr V3 array metadata. |
This seems good - you could potentially go the whole way to using |
The abstraction described here is a necessary prerequisite. Basically we need a utility function like While there's a lot of code in https://github.com/developmentseed/obstore/blob/main/obstore/python/obstore/fsspec.py that would help the development of that function, I think it's a big enough task to warrant its own PR. Somewhat relatedly, I'm planning to take tomorrow off and don't want to slow you down. So, my vote would be for you or @sharkinsspatial to take over this PR if you'd want to make progress before Monday. |
The typing error is because we don't have a way to handle "empty" chunk manifests in a ManifestArray. We need an analogous solution to the current approach of an empty numpy array in a xarray Variable, but I don't understand the circumstances requiring an empty chunk manifest well enough to propose a solution. |
70c2397
to
723ed33
Compare
for more information, see https://pre-commit.ci
@staticmethod | ||
def _construct_manifest_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every staticmethod
here aside from open_virtual_dataset
may as well just be an actual function. You will need to do that anyway later to refactor to make the reader just one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you could leave that to do in #498.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That is the plan. I think @maxrjones intent here was to make a very small change set to demonstrate the feasibility of using ManifestStore
and that we would refactor the structure in a PR for #498.
manifest = HDFVirtualBackend._dataset_chunk_manifest( | ||
path=empty_dataset_hdf5_file, dataset=ds | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's better to write tests that use higher-level public API instead of directly calling internals if you can help it, but that's just a nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think the tests will be easier to structure when we move away from VirtualBacked
base to something more functional and the returned ManifestStore
will be simpler to introspect and reason about than a complete xr.Dataset
.
First draft of first steps for addressing #498, based on changes originally proposed in #490