Conversation
… fix nested references
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
==========================================
- Coverage 74.05% 74.05% -0.01%
==========================================
Files 11 11
Lines 2598 2636 +38
Branches 406 415 +9
==========================================
+ Hits 1924 1952 +28
- Misses 565 570 +5
- Partials 109 114 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@bnlawrence Maybe you find the time to review. I tried to add comments to make certain changes more obvious. Hopefully these are the final changes and |
| try: | ||
| dtype = DatatypeMessage(buffer, offset).dtype | ||
| except NotImplementedError: | ||
| if name == 'REFERENCE_LIST': |
There was a problem hiding this comment.
Absolutely my fault. Should have returned None.
bnlawrence
left a comment
There was a problem hiding this comment.
To be honest, I can't properly review this without sitting with a debugger and working through it, and I can't do that for a wee while. @bmaranville you did a lot of the compound stuff. Do you have time to take a look?
| 'formats': formats, | ||
| 'offsets': _dtype[2], | ||
| 'itemsize': _dtype[3], | ||
| }) |
There was a problem hiding this comment.
Are we sure that there can be no other type of tuple appearing in a compound? E.g. could someone put an enumeration in here? I'd be tempted to trap any other type.
There was a problem hiding this comment.
You are correct - compound can be nested with almost anything (e.g. it can contain other compound types, too!), which is why the datatype lookup is recursive:
Line 141 in ee6ddb7
The handling here doesn't reflect that.
There was a problem hiding this comment.
Yes, didn't test that. I'll have a look at h5py/hdf5/netcdf for some examples on that.
There was a problem hiding this comment.
@bmaranville Thanks for highlighting that. My main task was to fix this for REFERENCE_LIST (dimension scale) compounds.
| is_ref = [isinstance(d, tuple) and d[0] == "REFERENCE" for d in _dtype[1]] | ||
| has_ref = any(is_ref) | ||
| formats = [f"V{d[1]}" if is_ref[i] else d for i, d in enumerate(_dtype[1])] | ||
| dtype = np.dtype( |
There was a problem hiding this comment.
I'm not sure exactly what is going on, but is this effectively saying that if you have a reference here, you'll simply say it's something binary that you don't know about? So this isn't dereferencing?
There was a problem hiding this comment.
Does the type get sorted later when being de-referenced?
There was a problem hiding this comment.
A Reference just contains an address (pointer) - it doesn't know the type of what it's pointing to. Note that when you get a "REFERENCE" in a datatype message, you have to read the bit fields to determine if it is an object reference (bits 0-3 == 0) or region reference (bits 0-3 == 1), see https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t11.html#subsubsec_fmt11_dataobject_hdr_dtmessage
A region reference is handled differently from an object reference, and region reference handling is not implemented in pyfive at this time.
There was a problem hiding this comment.
Yes, after reading the data, we transform the void buffer into pyfive.Reference. Maybe we can sort out a better way later.
There was a problem hiding this comment.
OK, then I'll need to make this work only for object references. Thanks @bmaranville.
I should say that what I can understand makes sense to me, so if Brian doesn't have anything to add, or doesn't have time, I'd say go ahead and merge: it doesn't break anything more, and it fixes some things, and I don't see any obvious gotchas. |
bmaranville
left a comment
There was a problem hiding this comment.
I'm not sure the "has_ref" boolean flag in the reading will be effective, since compound types can be (nearly) infinitely nested, so just knowing if there is a ref somewhere inside seems insufficient.
| is_ref = [isinstance(d, tuple) and d[0] == "REFERENCE" for d in _dtype[1]] | ||
| has_ref = any(is_ref) | ||
| formats = [f"V{d[1]}" if is_ref[i] else d for i, d in enumerate(_dtype[1])] | ||
| dtype = np.dtype( |
There was a problem hiding this comment.
A Reference just contains an address (pointer) - it doesn't know the type of what it's pointing to. Note that when you get a "REFERENCE" in a datatype message, you have to read the bit fields to determine if it is an object reference (bits 0-3 == 0) or region reference (bits 0-3 == 1), see https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t11.html#subsubsec_fmt11_dataobject_hdr_dtmessage
A region reference is handled differently from an object reference, and region reference handling is not implemented in pyfive at this time.
| 'formats': formats, | ||
| 'offsets': _dtype[2], | ||
| 'itemsize': _dtype[3], | ||
| }) |
There was a problem hiding this comment.
You are correct - compound can be nested with almost anything (e.g. it can contain other compound types, too!), which is why the datatype lookup is recursive:
Line 141 in ee6ddb7
The handling here doesn't reflect that.
Yes, thanks for showing the issue with the current approach. I'll need to think about this a bit more. |
|
I think it would be entirely reasonable to constrain this solution to what you need to do to get dimension scales working, and to raise errors for more deeply nested references and/or unhandled types of complexity. |
|
I'm sorry. I could not follow the current datatype within a tuple approach. I've taken the time to revamp those types. Please have a look at the other PR #122. |
|
@kmuehlbauer Is this still live, or was it all dealt with in #122? |
|
This was actually overridden by #122. Closing. |
Description
This enables the COMPOUND type to work for all compounds. It fixes some aspects of Complex Compound and it enables proper access of nested REFERENCE type. This is especially needed for HDF5/NetCDF4 dimension scale usage.
Closes #119
Before you get started
Checklist