Test case for corner case file (buffer too small) by valeriupredoi · Pull Request #160 · NCAS-CMS/pyfive

valeriupredoi · 2025-12-10T15:33:36Z

Description

@mo-gill raised a similar issue as we have seen and fixed in #100 but with a slightly different type of buffer-related issue, and file. I added the offending file in a test case here so we can have full control over the used case. I suspect it's yet again a NULL vector that's causing it, but I've not looked closely.

Closes #158

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

This pull request has a descriptive title and labels
This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
Unit tests have been added (if codecov test fails)
All tests pass

…ged buffer offset mappings

kmuehlbauer · 2025-12-12T12:45:08Z

@valeriupredoi This was a bit harder than expected. We were in need of some more bookkeeping, but I hope the solution is good enough.

codecov · 2025-12-12T12:45:27Z

Codecov Report

❌ Patch coverage is 45.00000% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (68add89) to head (363e46e).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
pyfive/utilities.py	0.00%	26 Missing ⚠️
pyfive/misc_low_level.py	84.37%	3 Missing and 2 partials ⚠️
pyfive/dataobjects.py	0.00%	1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (45.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #160      +/-   ##
==========================================
+ Coverage   76.18%   76.83%   +0.65%     
==========================================
  Files          14       15       +1     
  Lines        2893     2936      +43     
  Branches      459      467       +8     
==========================================
+ Hits         2204     2256      +52     
+ Misses        564      558       -6     
+ Partials      125      122       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pyfive/misc_low_level.py

make function private

valeriupredoi

wow this was a wee bit of a proper corner case, many thanks @kmuehlbauer
@bnlawrence if you have a minute quickly look this over, and merge, please 🍻

kmuehlbauer · 2025-12-13T08:54:28Z

I'm thinking about adding a test which creates such kind of FractalHeap layout from scratch.

bnlawrence · 2025-12-15T11:44:40Z

@kmuehlbauer Thanks for jumping in on this, it was on my to-do for today. If I've understood what you've suggested ehre, you've dealt with two issues, one is the nothing there, which you've pretty much dealt with by using continue instead of break, and the other is the bookkeeping around not properly getting all the heap information. Does our test case cover both, or have I got that wrong?

kmuehlbauer · 2025-12-15T11:47:48Z

@bnlawrence I'm pretty much into this now. I've discovered another "flaw" when checking for direct/indirect FHDB/FHIB nesting. I'll try to get this fixed, too and add a dedicated test.

valeriupredoi · 2025-12-15T12:22:10Z

superb work @kmuehlbauer - an a fair bit over my expertise, so many thanks! It'd be good to have another netCDF4 test file to reflect the second corner/used case you mention, and keep intact the one for the "nothing there" supplied by @mo-gill that I popped in here 🍻

… all possible blocks, add first fractal heap tests

kmuehlbauer · 2025-12-15T15:00:53Z

pyfive/misc_low_level.py

-        if nrows <= ndirect_max:
+        # this info cannot tell the precise amount of blocks
+        # it can just tell us the maximum possible amount we should parse
+        if nobjects <= ndirect_max:


Third "flaw":
This prevented decoding, so we need to take table_width into account.

Note: this was only a problem for many, many, many attributes and/or very, very large attributes

kmuehlbauer

@bnlawrence Added some code comments, as well as comments on the PR.

Hope you find your way through the FractalHeap 🎉

kmuehlbauer · 2025-12-15T15:02:05Z

tests/test_fractal_heap.py

+
+@pytest.mark.parametrize("payload_size", [4033, 4032])
+@pytest.mark.parametrize("n_attrs", [10, 11])
+def test_huge_object(name, payload_size, n_attrs):


This is something a stumbled over while trying to generate a test. So for my linux these figures work, but we might need to generalize better.

kmuehlbauer · 2025-12-15T15:03:49Z

tests/test_fractal_heap.py

+            for k, v in attrs.items():
+                np.testing.assert_equal(v, attrs2[k])
+
+@pytest.mark.parametrize("n_attrs", [115, 116])


If you debug iterating fractal heap, you will see, that 115 will not have another FHIB whereas 116 there is another FHIB with one nested FHDB. Seems there is no way of checking other than iterating again.

kmuehlbauer · 2025-12-15T15:05:39Z

pyfive/misc_low_level.py

            address = struct.unpack('<Q', fh.read(8))[0]
            if address == UNDEFINED_ADDRESS:
-                break
+                # if there is no valid address, we move on to the next


So, this was the first issue: There are cases, where attributes are in higher rows, even if lower rows are not set. So iteration over all possible addresses is needed!

kmuehlbauer · 2025-12-15T15:05:49Z

pyfive/misc_low_level.py

            address = struct.unpack('<Q', fh.read(8))[0]
            if address == UNDEFINED_ADDRESS:
-                break
+                # same here, move on to the next address


kmuehlbauer · 2025-12-15T15:07:41Z

pyfive/misc_low_level.py

-                return self.managed[offset:offset+size]
+
+                # map heap_id offset to flat buffer offset
+                offset = self._heapid_to_buffer_offset(offset)


So, this is the second issue: The original code extracts offset and size into the heap, not into out flat managed buffer. When we have attributes in higher rows, this would break. We somehow have to get the correct addresses, we do it by storing them on the heap object.

bnlawrence · 2025-12-16T11:25:46Z

Again, thanks @kmuehlbauer. I've been down in the bowels for a few hours, and think you've done a great job. In trying to understand it, I think I can see a couple of optimisations - but they are the sort that might look good on paper but achieve nothing in practice, so I am minded not to do anything now. (For the record, the direction of travel would be to try and avoid all the little reads, and improve the heap id to offset calculation efficiency ... though that's the bit that I think is more in theory than practice, especially for NetCDF4, where this is only attribute stuff, and so wont be too voluminous I expect.)

bnlawrence · 2025-12-16T11:26:58Z

I am going to push up some more documentation and a utility, to help us with improving this at some future time, but in terms of your changes, I'll approve this for merge as soon as I've done that.

bnlawrence

Ok, as noted above, I think this is good to go!

valeriupredoi · 2025-12-16T12:11:07Z

brilliant, great many thanks @kmuehlbauer and @bnlawrence 🍻 Kai, could I please get an Approve from you on this? Then I could merge 🫡

pyfive/misc_low_level.py

valeriupredoi · 2025-12-16T12:54:17Z

superb work gents! 🥳

valeriupredoi added 2 commits December 10, 2025 15:29

add test case file

cf31602

add test case

c759097

valeriupredoi requested review from bnlawrence and kmuehlbauer December 10, 2025 15:33

valeriupredoi added bug testing labels Dec 10, 2025

correct test data file path

d2d2f40

valeriupredoi mentioned this pull request Dec 10, 2025

Possible pyfive bug/edge case #158

Closed

fix reading of managed data in FractalHeap, keep track of heapID/mana…

6f88227

…ged buffer offset mappings

kmuehlbauer mentioned this pull request Dec 12, 2025

Possible CAN'T PARSE: File bug with check_cmip7_packing NCAS-CMS/cmip7_repack#21

Closed

kmuehlbauer requested changes Dec 12, 2025

View reviewed changes

pyfive/misc_low_level.py Outdated Show resolved Hide resolved

pyfive/misc_low_level.py Outdated Show resolved Hide resolved

Apply suggestions from code review

55d0bc5

make function private

kmuehlbauer self-requested a review December 12, 2025 12:58

valeriupredoi commented Dec 12, 2025

View reviewed changes

fix code assertions, fix block_size for indirect blocks, iterate over…

871bc03

… all possible blocks, add first fractal heap tests

kmuehlbauer reviewed Dec 15, 2025

View reviewed changes

Some documentation and debug utiliity for the future.

99c2ea9

bnlawrence approved these changes Dec 16, 2025

View reviewed changes

kmuehlbauer requested changes Dec 16, 2025

View reviewed changes

pyfive/misc_low_level.py Outdated Show resolved Hide resolved

Update pyfive/misc_low_level.py

363e46e

kmuehlbauer approved these changes Dec 16, 2025

View reviewed changes

valeriupredoi merged commit a83d389 into main Dec 16, 2025
6 of 7 checks passed

valeriupredoi deleted the add_mogill_corner_test branch December 16, 2025 12:54

valeriupredoi mentioned this pull request Dec 18, 2025

add changelog for v1.0.1 #166

Merged

5 tasks

Conversation

valeriupredoi commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Before you get started

Checklist

Uh oh!

kmuehlbauer commented Dec 12, 2025

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer commented Dec 13, 2025

Uh oh!

bnlawrence commented Dec 15, 2025

Uh oh!

kmuehlbauer commented Dec 15, 2025

Uh oh!

valeriupredoi commented Dec 15, 2025

Uh oh!

kmuehlbauer Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

bnlawrence commented Dec 16, 2025

Uh oh!

bnlawrence commented Dec 16, 2025

Uh oh!

bnlawrence left a comment

Choose a reason for hiding this comment

Uh oh!

valeriupredoi commented Dec 16, 2025

Uh oh!

Uh oh!

valeriupredoi commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valeriupredoi commented Dec 10, 2025 •

edited

Loading

codecov bot commented Dec 12, 2025 •

edited

Loading

kmuehlbauer Dec 15, 2025 •

edited

Loading