Test case for corner case file (buffer too small)#160
Conversation
…ged buffer offset mappings
|
@valeriupredoi This was a bit harder than expected. We were in need of some more bookkeeping, but I hope the solution is good enough. |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (45.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #160 +/- ##
==========================================
+ Coverage 76.18% 76.83% +0.65%
==========================================
Files 14 15 +1
Lines 2893 2936 +43
Branches 459 467 +8
==========================================
+ Hits 2204 2256 +52
+ Misses 564 558 -6
+ Partials 125 122 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
make function private
valeriupredoi
left a comment
There was a problem hiding this comment.
wow this was a wee bit of a proper corner case, many thanks @kmuehlbauer
@bnlawrence if you have a minute quickly look this over, and merge, please 🍻
|
I'm thinking about adding a test which creates such kind of FractalHeap layout from scratch. |
|
@kmuehlbauer Thanks for jumping in on this, it was on my to-do for today. If I've understood what you've suggested ehre, you've dealt with two issues, one is the nothing there, which you've pretty much dealt with by using continue instead of break, and the other is the bookkeeping around not properly getting all the heap information. Does our test case cover both, or have I got that wrong? |
|
@bnlawrence I'm pretty much into this now. I've discovered another "flaw" when checking for direct/indirect FHDB/FHIB nesting. I'll try to get this fixed, too and add a dedicated test. |
|
superb work @kmuehlbauer - an a fair bit over my expertise, so many thanks! It'd be good to have another netCDF4 test file to reflect the second corner/used case you mention, and keep intact the one for the "nothing there" supplied by @mo-gill that I popped in here 🍻 |
… all possible blocks, add first fractal heap tests
| if nrows <= ndirect_max: | ||
| # this info cannot tell the precise amount of blocks | ||
| # it can just tell us the maximum possible amount we should parse | ||
| if nobjects <= ndirect_max: |
There was a problem hiding this comment.
Third "flaw":
This prevented decoding, so we need to take table_width into account.
There was a problem hiding this comment.
Note: this was only a problem for many, many, many attributes and/or very, very large attributes
kmuehlbauer
left a comment
There was a problem hiding this comment.
@bnlawrence Added some code comments, as well as comments on the PR.
Hope you find your way through the FractalHeap 🎉
|
|
||
| @pytest.mark.parametrize("payload_size", [4033, 4032]) | ||
| @pytest.mark.parametrize("n_attrs", [10, 11]) | ||
| def test_huge_object(name, payload_size, n_attrs): |
There was a problem hiding this comment.
This is something a stumbled over while trying to generate a test. So for my linux these figures work, but we might need to generalize better.
| for k, v in attrs.items(): | ||
| np.testing.assert_equal(v, attrs2[k]) | ||
|
|
||
| @pytest.mark.parametrize("n_attrs", [115, 116]) |
There was a problem hiding this comment.
If you debug iterating fractal heap, you will see, that 115 will not have another FHIB whereas 116 there is another FHIB with one nested FHDB. Seems there is no way of checking other than iterating again.
| address = struct.unpack('<Q', fh.read(8))[0] | ||
| if address == UNDEFINED_ADDRESS: | ||
| break | ||
| # if there is no valid address, we move on to the next |
There was a problem hiding this comment.
So, this was the first issue: There are cases, where attributes are in higher rows, even if lower rows are not set. So iteration over all possible addresses is needed!
| address = struct.unpack('<Q', fh.read(8))[0] | ||
| if address == UNDEFINED_ADDRESS: | ||
| break | ||
| # same here, move on to the next address |
| return self.managed[offset:offset+size] | ||
|
|
||
| # map heap_id offset to flat buffer offset | ||
| offset = self._heapid_to_buffer_offset(offset) |
There was a problem hiding this comment.
So, this is the second issue: The original code extracts offset and size into the heap, not into out flat managed buffer. When we have attributes in higher rows, this would break. We somehow have to get the correct addresses, we do it by storing them on the heap object.
|
Again, thanks @kmuehlbauer. I've been down in the bowels for a few hours, and think you've done a great job. In trying to understand it, I think I can see a couple of optimisations - but they are the sort that might look good on paper but achieve nothing in practice, so I am minded not to do anything now. (For the record, the direction of travel would be to try and avoid all the little reads, and improve the heap id to offset calculation efficiency ... though that's the bit that I think is more in theory than practice, especially for NetCDF4, where this is only attribute stuff, and so wont be too voluminous I expect.) |
|
I am going to push up some more documentation and a utility, to help us with improving this at some future time, but in terms of your changes, I'll approve this for merge as soon as I've done that. |
bnlawrence
left a comment
There was a problem hiding this comment.
Ok, as noted above, I think this is good to go!
|
brilliant, great many thanks @kmuehlbauer and @bnlawrence 🍻 Kai, could I please get an Approve from you on this? Then I could merge 🫡 |
|
superb work gents! 🥳 |
Description
@mo-gill raised a similar issue as we have seen and fixed in #100 but with a slightly different type of buffer-related issue, and file. I added the offending file in a test case here so we can have full control over the used case. I suspect it's yet again a NULL vector that's causing it, but I've not looked closely.
Closes #158
Before you get started
Checklist