Skip to content

full feature COMPOUND type#120

Closed
kmuehlbauer wants to merge 1 commit intoNCAS-CMS:mainfrom
kmuehlbauer:compound-with-references
Closed

full feature COMPOUND type#120
kmuehlbauer wants to merge 1 commit intoNCAS-CMS:mainfrom
kmuehlbauer:compound-with-references

Conversation

@kmuehlbauer
Copy link
Collaborator

  • fix complex compound
  • fix nested references

Description

This enables the COMPOUND type to work for all compounds. It fixes some aspects of Complex Compound and it enables proper access of nested REFERENCE type. This is especially needed for HDF5/NetCDF4 dimension scale usage.

Closes #119

Before you get started

Checklist

  • This pull request has a descriptive title and labels
  • This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
  • Unit tests have been added (if codecov test fails)
  • Any changed dependencies have been added or removed correctly (if need be)
  • If you are working on the documentation, please ensure the current build passes
  • All tests pass

@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 83.67347% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.05%. Comparing base (8a44c01) to head (ee6ddb7).
⚠️ Report is 228 commits behind head on main.

Files with missing lines Patch % Lines
pyfive/datatype_msg.py 77.77% 2 Missing and 4 partials ⚠️
pyfive/dataobjects.py 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #120      +/-   ##
==========================================
- Coverage   74.05%   74.05%   -0.01%     
==========================================
  Files          11       11              
  Lines        2598     2636      +38     
  Branches      406      415       +9     
==========================================
+ Hits         1924     1952      +28     
- Misses        565      570       +5     
- Partials      109      114       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kmuehlbauer
Copy link
Collaborator Author

@bnlawrence Maybe you find the time to review. I tried to add comments to make certain changes more obvious. Hopefully these are the final changes and h5netcdf is ready now.

try:
dtype = DatatypeMessage(buffer, offset).dtype
except NotImplementedError:
if name == 'REFERENCE_LIST':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely my fault. Should have returned None.

Copy link
Collaborator

@bnlawrence bnlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I can't properly review this without sitting with a debugger and working through it, and I can't do that for a wee while. @bmaranville you did a lot of the compound stuff. Do you have time to take a look?

'formats': formats,
'offsets': _dtype[2],
'itemsize': _dtype[3],
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that there can be no other type of tuple appearing in a compound? E.g. could someone put an enumeration in here? I'd be tempted to trap any other type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct - compound can be nested with almost anything (e.g. it can contain other compound types, too!), which is why the datatype lookup is recursive:

comp_dtype = self.determine_dtype()

The handling here doesn't reflect that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, didn't test that. I'll have a look at h5py/hdf5/netcdf for some examples on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmaranville Thanks for highlighting that. My main task was to fix this for REFERENCE_LIST (dimension scale) compounds.

is_ref = [isinstance(d, tuple) and d[0] == "REFERENCE" for d in _dtype[1]]
has_ref = any(is_ref)
formats = [f"V{d[1]}" if is_ref[i] else d for i, d in enumerate(_dtype[1])]
dtype = np.dtype(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what is going on, but is this effectively saying that if you have a reference here, you'll simply say it's something binary that you don't know about? So this isn't dereferencing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the type get sorted later when being de-referenced?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Reference just contains an address (pointer) - it doesn't know the type of what it's pointing to. Note that when you get a "REFERENCE" in a datatype message, you have to read the bit fields to determine if it is an object reference (bits 0-3 == 0) or region reference (bits 0-3 == 1), see https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t11.html#subsubsec_fmt11_dataobject_hdr_dtmessage

A region reference is handled differently from an object reference, and region reference handling is not implemented in pyfive at this time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after reading the data, we transform the void buffer into pyfive.Reference. Maybe we can sort out a better way later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'll need to make this work only for object references. Thanks @bmaranville.

@bnlawrence
Copy link
Collaborator

To be honest, I can't properly review this without sitting with a debugger and working through it, and I can't do that for a wee while. @bmaranville you did a lot of the compound stuff. Do you have time to take a look?

I should say that what I can understand makes sense to me, so if Brian doesn't have anything to add, or doesn't have time, I'd say go ahead and merge: it doesn't break anything more, and it fixes some things, and I don't see any obvious gotchas.

Copy link
Collaborator

@bmaranville bmaranville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the "has_ref" boolean flag in the reading will be effective, since compound types can be (nearly) infinitely nested, so just knowing if there is a ref somewhere inside seems insufficient.

is_ref = [isinstance(d, tuple) and d[0] == "REFERENCE" for d in _dtype[1]]
has_ref = any(is_ref)
formats = [f"V{d[1]}" if is_ref[i] else d for i, d in enumerate(_dtype[1])]
dtype = np.dtype(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Reference just contains an address (pointer) - it doesn't know the type of what it's pointing to. Note that when you get a "REFERENCE" in a datatype message, you have to read the bit fields to determine if it is an object reference (bits 0-3 == 0) or region reference (bits 0-3 == 1), see https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t11.html#subsubsec_fmt11_dataobject_hdr_dtmessage

A region reference is handled differently from an object reference, and region reference handling is not implemented in pyfive at this time.

'formats': formats,
'offsets': _dtype[2],
'itemsize': _dtype[3],
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct - compound can be nested with almost anything (e.g. it can contain other compound types, too!), which is why the datatype lookup is recursive:

comp_dtype = self.determine_dtype()

The handling here doesn't reflect that.

@kmuehlbauer
Copy link
Collaborator Author

I'm not sure the "has_ref" boolean flag in the reading will be effective, since compound types can be (nearly) infinitely nested, so just knowing if there is a ref somewhere inside seems insufficient.

Yes, thanks for showing the issue with the current approach. I'll need to think about this a bit more.

@bnlawrence
Copy link
Collaborator

I think it would be entirely reasonable to constrain this solution to what you need to do to get dimension scales working, and to raise errors for more deeply nested references and/or unhandled types of complexity.

@kmuehlbauer
Copy link
Collaborator Author

I'm sorry. I could not follow the current datatype within a tuple approach. I've taken the time to revamp those types. Please have a look at the other PR #122.

@valeriupredoi valeriupredoi added this to the 1.0 milestone Oct 17, 2025
@bnlawrence
Copy link
Collaborator

@kmuehlbauer Is this still live, or was it all dealt with in #122?

@kmuehlbauer
Copy link
Collaborator Author

This was actually overridden by #122. Closing.

@kmuehlbauer kmuehlbauer deleted the compound-with-references branch October 27, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with COMPOUND including REFERENCE

4 participants