-
Notifications
You must be signed in to change notification settings - Fork 94
Allow HDF Groups #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Allow HDF Groups #424
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
91939f4
Allow regex filter in HDF
martindurant 183d126
actual module
martindurant 2075162
Allow passing an h5py object
martindurant b388d95
Allow passing in HDF Dataset
martindurant 2cfd218
hack names
martindurant bbd4105
remove hdf dataset route after all
martindurant ecc5f49
Update kerchunk/hdf.py
martindurant a33afb2
Update kerchunk/hdf.py
martindurant 9280b26
Update kerchunk/hdf.py
martindurant b4fc62d
omit remaking HDF File
martindurant File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need these two lines anymore (they certainly mess up my used case where the file is an S3 object), since the file is loaded as
Fileobject up in the first branch of the conditional, ifh5fis anh5py.Groupthen it should be kept that way withself._h5fset to itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_h5f is indeed set to the input two lines above. This exists for any inlining that might happen, which requires getting bytes directly from the original file, not going via h5py.
What happens? I think providing the URL/options will certainly be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my case it's looking for a local file even if I pass valid S3
storage_options- leave it like this for now, I'll need to do a wee bit more testing to understand what's going on, and will get back to you if Kerchunk needs changing 👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urls starts with "s3://"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and no 🤣 It's a very peculariar bucket, the storage options dict that s3fs recognizes is
the call to
s3fsto able to read such a strange bucket is as follows:but
file_urlneeds to be the truncated (bucket + file-name) iebnl/da193a_25_day__198807-198807.ncin this case, and s3fs is assembling its full URL via the endpoint URL and that truncated bucket _ filename - it's odd, not 100% sure why this type of s3 storage wants that configuration, but bottom line is in the case of Kerchunk trying to open it as a regular s3 file it's not working - even if I prepend a correct full s3://...path to the file, I get Forbidden access since the storage identification is done wronglyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely not the right URL: the first part should be the bucket, not a server name (I'm surprised it even attempts to connect). The URL should be "s3://bnl/da193a_25_day__198807-198807.nc", as the server/endpoint is already included in the storage options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blast! That worked! I knew I'm not doing something right 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though am getting fairly long times from
visititems()- very much comparable times to the ones where there is no kerchunking done on a single Group, but rather, on the entire fileThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that's because this
self._h5f = h5py.File(self.input_file, mode="r")is a few lines down 😁There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oops, fixed)