Skip to content

Test case for corner case file (buffer too small)#160

Merged
valeriupredoi merged 8 commits intomainfrom
add_mogill_corner_test
Dec 16, 2025
Merged

Test case for corner case file (buffer too small)#160
valeriupredoi merged 8 commits intomainfrom
add_mogill_corner_test

Conversation

@valeriupredoi
Copy link
Collaborator

@valeriupredoi valeriupredoi commented Dec 10, 2025

Description

@mo-gill raised a similar issue as we have seen and fixed in #100 but with a slightly different type of buffer-related issue, and file. I added the offending file in a test case here so we can have full control over the used case. I suspect it's yet again a NULL vector that's causing it, but I've not looked closely.

Closes #158

Before you get started

Checklist

  • This pull request has a descriptive title and labels
  • This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
  • Unit tests have been added (if codecov test fails)
  • All tests pass

@kmuehlbauer
Copy link
Collaborator

@valeriupredoi This was a bit harder than expected. We were in need of some more bookkeeping, but I hope the solution is good enough.

@codecov
Copy link

codecov bot commented Dec 12, 2025

Codecov Report

❌ Patch coverage is 45.00000% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (68add89) to head (363e46e).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
pyfive/utilities.py 0.00% 26 Missing ⚠️
pyfive/misc_low_level.py 84.37% 3 Missing and 2 partials ⚠️
pyfive/dataobjects.py 0.00% 1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (45.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #160      +/-   ##
==========================================
+ Coverage   76.18%   76.83%   +0.65%     
==========================================
  Files          14       15       +1     
  Lines        2893     2936      +43     
  Branches      459      467       +8     
==========================================
+ Hits         2204     2256      +52     
+ Misses        564      558       -6     
+ Partials      125      122       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

make function private
@kmuehlbauer kmuehlbauer self-requested a review December 12, 2025 12:58
Copy link
Collaborator Author

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this was a wee bit of a proper corner case, many thanks @kmuehlbauer
@bnlawrence if you have a minute quickly look this over, and merge, please 🍻

@kmuehlbauer
Copy link
Collaborator

I'm thinking about adding a test which creates such kind of FractalHeap layout from scratch.

@bnlawrence
Copy link
Collaborator

@kmuehlbauer Thanks for jumping in on this, it was on my to-do for today. If I've understood what you've suggested ehre, you've dealt with two issues, one is the nothing there, which you've pretty much dealt with by using continue instead of break, and the other is the bookkeeping around not properly getting all the heap information. Does our test case cover both, or have I got that wrong?

@kmuehlbauer
Copy link
Collaborator

@bnlawrence I'm pretty much into this now. I've discovered another "flaw" when checking for direct/indirect FHDB/FHIB nesting. I'll try to get this fixed, too and add a dedicated test.

@valeriupredoi
Copy link
Collaborator Author

superb work @kmuehlbauer - an a fair bit over my expertise, so many thanks! It'd be good to have another netCDF4 test file to reflect the second corner/used case you mention, and keep intact the one for the "nothing there" supplied by @mo-gill that I popped in here 🍻

… all possible blocks, add first fractal heap tests
if nrows <= ndirect_max:
# this info cannot tell the precise amount of blocks
# it can just tell us the maximum possible amount we should parse
if nobjects <= ndirect_max:
Copy link
Collaborator

@kmuehlbauer kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third "flaw":
This prevented decoding, so we need to take table_width into account.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this was only a problem for many, many, many attributes and/or very, very large attributes

Copy link
Collaborator

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnlawrence Added some code comments, as well as comments on the PR.

Hope you find your way through the FractalHeap 🎉


@pytest.mark.parametrize("payload_size", [4033, 4032])
@pytest.mark.parametrize("n_attrs", [10, 11])
def test_huge_object(name, payload_size, n_attrs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something a stumbled over while trying to generate a test. So for my linux these figures work, but we might need to generalize better.

for k, v in attrs.items():
np.testing.assert_equal(v, attrs2[k])

@pytest.mark.parametrize("n_attrs", [115, 116])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you debug iterating fractal heap, you will see, that 115 will not have another FHIB whereas 116 there is another FHIB with one nested FHDB. Seems there is no way of checking other than iterating again.

address = struct.unpack('<Q', fh.read(8))[0]
if address == UNDEFINED_ADDRESS:
break
# if there is no valid address, we move on to the next
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this was the first issue: There are cases, where attributes are in higher rows, even if lower rows are not set. So iteration over all possible addresses is needed!

address = struct.unpack('<Q', fh.read(8))[0]
if address == UNDEFINED_ADDRESS:
break
# same here, move on to the next address
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

return self.managed[offset:offset+size]

# map heap_id offset to flat buffer offset
offset = self._heapid_to_buffer_offset(offset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is the second issue: The original code extracts offset and size into the heap, not into out flat managed buffer. When we have attributes in higher rows, this would break. We somehow have to get the correct addresses, we do it by storing them on the heap object.

@bnlawrence
Copy link
Collaborator

Again, thanks @kmuehlbauer. I've been down in the bowels for a few hours, and think you've done a great job. In trying to understand it, I think I can see a couple of optimisations - but they are the sort that might look good on paper but achieve nothing in practice, so I am minded not to do anything now. (For the record, the direction of travel would be to try and avoid all the little reads, and improve the heap id to offset calculation efficiency ... though that's the bit that I think is more in theory than practice, especially for NetCDF4, where this is only attribute stuff, and so wont be too voluminous I expect.)

@bnlawrence
Copy link
Collaborator

I am going to push up some more documentation and a utility, to help us with improving this at some future time, but in terms of your changes, I'll approve this for merge as soon as I've done that.

Copy link
Collaborator

@bnlawrence bnlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as noted above, I think this is good to go!

@valeriupredoi
Copy link
Collaborator Author

brilliant, great many thanks @kmuehlbauer and @bnlawrence 🍻 Kai, could I please get an Approve from you on this? Then I could merge 🫡

@valeriupredoi
Copy link
Collaborator Author

superb work gents! 🥳

@valeriupredoi valeriupredoi merged commit a83d389 into main Dec 16, 2025
6 of 7 checks passed
@valeriupredoi valeriupredoi deleted the add_mogill_corner_test branch December 16, 2025 12:54
@valeriupredoi valeriupredoi mentioned this pull request Dec 18, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible pyfive bug/edge case

3 participants