Skip to content

Adding support for reading only chunks and various pieces of the H5Py lower level interface#5

Merged
bnlawrence merged 29 commits intomasterfrom
h5pyapi
Jul 9, 2024
Merged

Adding support for reading only chunks and various pieces of the H5Py lower level interface#5
bnlawrence merged 29 commits intomasterfrom
h5pyapi

Conversation

@bnlawrence
Copy link
Collaborator

The key changes are the introduction of a subclass of DataObjects which represents a singleton variable DataObject, and a new H5D class which provides the lower level chunk interface. Arguably these two classes could be combined.

Bryan Lawrence and others added 29 commits February 22, 2024 12:20
…changes to actually using the filter pipeline. At this point is failling test_reference.
…lso remove list definition which breaks references.
… a pseudo chunked read. Lots of things to do around optimising that read, but let's test this more widely first.
Copy link
Collaborator

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first set of review comments @bnlawrence - mostly very minor stuff 😁 )review stopped at h5d.py)

Copy link
Collaborator

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the review @bnlawrence - really awesome stuff here! I think we should approach this in a structured manner:

  • get this PR in your main fork's master (after you've done whatever changes you think need be done from my review - I can also help with a re-review at the very end)
  • crack open a PR from your fork to the main Pyfive repo - provided that all tests pass, and we write a very descriptive PR note (that can also go in their docs - which they don't really have, but that would be a good start)
  • offer me and/or you and/or @davidhassell as maintainers of their package - on conda-feedstock (so we get our men in the loop)
  • offer you as Pyfive core dev (or whatever they call it, main developer etc)
  • note that they may complain about the Zarr module we lift and shift - we ought to think about that

Cheers and have fun Swifting 🤣

from .indexing import OrthogonalIndexer, ZarrArrayStub
from .btree import BTreeV1RawDataChunks

StoreInfo = namedtuple('StoreInfo',"chunk_offset filter_mask byte_offset size")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StoreInfo = namedtuple('StoreInfo',"chunk_offset filter_mask byte_offset size")
StoreInfo = namedtuple('StoreInfo', "chunk_offset filter_mask byte_offset size")

if self.filter_pipeline is not None:
chunk_buffer = BTreeV1RawDataChunks._filter_chunk(chunk_buffer, filter_mask, self.filter_pipeline, self.dtype.itemsize)
chunk_data = np.frombuffer(chunk_buffer, dtype=self.dtype)
out[out_selection] = chunk_data.reshape(self._chunks, order=self._order)[chunk_selection]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was giving us plenty of headaches in PyActive - I'd definitely encase it in a few try/excepts with plenty of descriptive exceptions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be much safer in here. I'd like it to die very publicly if it does.

@bnlawrence bnlawrence merged commit 400c798 into master Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants