Adding support for reading only chunks and various pieces of the H5Py lower level interface by bnlawrence · Pull Request #5 · NCAS-CMS/pyfive-fork-obsolete

bnlawrence · 2024-04-26T11:04:07Z

The key changes are the introduction of a subclass of DataObjects which represents a singleton variable DataObject, and a new H5D class which provides the lower level chunk interface. Arguably these two classes could be combined.

…er right yet.

…ven't got any tests around this yet.

…changes to actually using the filter pipeline. At this point is failling test_reference.

…lso remove list definition which breaks references.

… a pseudo chunked read. Lots of things to do around optimising that read, but let's test this more widely first.

valeriupredoi

first set of review comments @bnlawrence - mostly very minor stuff 😁 )review stopped at h5d.py)

pyfive/__init__.py

pyfive/dataobjects.py

valeriupredoi

Done with the review @bnlawrence - really awesome stuff here! I think we should approach this in a structured manner:

get this PR in your main fork's master (after you've done whatever changes you think need be done from my review - I can also help with a re-review at the very end)
crack open a PR from your fork to the main Pyfive repo - provided that all tests pass, and we write a very descriptive PR note (that can also go in their docs - which they don't really have, but that would be a good start)
offer me and/or you and/or @davidhassell as maintainers of their package - on conda-feedstock (so we get our men in the loop)
offer you as Pyfive core dev (or whatever they call it, main developer etc)
note that they may complain about the Zarr module we lift and shift - we ought to think about that

Cheers and have fun Swifting 🤣

pyfive/h5d.py

valeriupredoi · 2024-06-06T14:36:11Z

pyfive/h5d.py

+from .indexing import OrthogonalIndexer, ZarrArrayStub
+from .btree import BTreeV1RawDataChunks
+
+StoreInfo = namedtuple('StoreInfo',"chunk_offset filter_mask byte_offset size")


Suggested change

StoreInfo = namedtuple('StoreInfo',"chunk_offset filter_mask byte_offset size")

StoreInfo = namedtuple('StoreInfo', "chunk_offset filter_mask byte_offset size")

pyfive/h5d.py

valeriupredoi · 2024-06-06T14:49:13Z

pyfive/h5d.py

+            if self.filter_pipeline is not None:
+                chunk_buffer = BTreeV1RawDataChunks._filter_chunk(chunk_buffer, filter_mask, self.filter_pipeline, self.dtype.itemsize)
+            chunk_data = np.frombuffer(chunk_buffer, dtype=self.dtype)
+            out[out_selection] = chunk_data.reshape(self._chunks, order=self._order)[chunk_selection]


this was giving us plenty of headaches in PyActive - I'd definitely encase it in a few try/excepts with plenty of descriptive exceptions

I think this should be much safer in here. I'd like it to die very publicly if it does.

pyfive/indexing.py

tests/test_h5d.py

Bryan Lawrence and others added 29 commits February 22, 2024 12:20

Using s3 to get at some real data for testing

02fca54

Getting the address as well as size into the index

df3669a

With timer

16c0e81

Not working yet. Don't reckon I have the arguments to OrthogonalIndex…

c464be8

…er right yet.

A few more notes in the code so I can come back to it anon.

afaa4f5

Woops. Need this.

18bc37c

First working lazy read (only reads chunks needed for selection)

4b0ac08

Woops didnt' commit the real oil

5356aa0

Should now support filtering chunks in the partical chunk loading. Ha…

9fe2394

…ven't got any tests around this yet.

Some additional documentation

dafb3c9

Seems to work, prior to re-integration

53e4ebe

Moved chunk support into standard API

9ac0bbd

removing playing code

a88a150

Merge branch 'jjhelmus:master' into issue6

89aafe3

Fixes bug which stops the selection read from actually occurring and …

96dc178

…changes to actually using the filter pipeline. At this point is failling test_reference.

Hack to avoid reference datatypes in chunk by chunk selections.

eb44c15

Remove obsolete function

51f7cca

Support for third party access to contiguous data address and size. A…

1f61d6c

…lso remove list definition which breaks references.

First cut, fails references and classic, even with new stuff turned off?

e6217b5

This version appears to now support failing over from a memory map to…

67c93e0

… a pseudo chunked read. Lots of things to do around optimising that read, but let's test this more widely first.

First cut, no tests yet

a08ee20

Improvements

dc00503

With some failing tests

9ffb5b2

Fixed one test

223a931

All tests for new functionality pass, but I've broken something old

3a256ab

Now passing all tests

32d83dd

Checking coverage of get_chunk_info_by_coord(method)

f5f89c5

Missing docstring

2c8f59c

Cleaning up

013ce62

bnlawrence assigned valeriupredoi Apr 26, 2024

bnlawrence requested a review from valeriupredoi April 26, 2024 11:22

valeriupredoi requested changes Jun 6, 2024

View reviewed changes

valeriupredoi approved these changes Jun 6, 2024

View reviewed changes

bnlawrence merged commit 400c798 into master Jul 9, 2024

	StoreInfo = namedtuple('StoreInfo',"chunk_offset filter_mask byte_offset size")
	StoreInfo = namedtuple('StoreInfo', "chunk_offset filter_mask byte_offset size")

Conversation

bnlawrence commented Apr 26, 2024

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

valeriupredoi Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valeriupredoi Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

bnlawrence Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants