fix Enum and Empty attributes#102
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #102 +/- ##
==========================================
+ Coverage 72.58% 72.73% +0.14%
==========================================
Files 11 11
Lines 2499 2505 +6
Branches 379 382 +3
==========================================
+ Hits 1814 1822 +8
+ Misses 583 580 -3
- Partials 102 103 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kmuehlbauer
left a comment
There was a problem hiding this comment.
I've left some comments for clarification.
tests/test_attr_datatypes.py
Outdated
| def test_string_array_attr_datatypes(): | ||
| with pyfive.File(ATTR_DATATYPES_HDF5_FILE) as hfile: | ||
| assert hfile.attrs['vlen_str_array'][0] == 'Hello' | ||
| assert hfile.attrs['vlen_str_array'][1] == 'World!' | ||
|
|
||
| assert hfile.attrs['vlen_str_array'].dtype == np.dtype('O') | ||
| assert hfile.attrs['vlen_str_array'].dtype.metadata == {'h5py_encoding': 'utf-8'} |
There was a problem hiding this comment.
This reads back now as strings. Because I had to fix this, I've taken the chance to move this to its own test function.
There was a problem hiding this comment.
So, I think this is a different behaviour from h5py.
with h5py.File(ATTR_DATATYPES_HDF5_FILE) as hfile:
...: x=hfile.attrs['vlen_str_array'][0]
...: print(x)
b'Hello'
so I'm thinking this is not what we want?
There was a problem hiding this comment.
I'll double check this one.
There was a problem hiding this comment.
@bnlawrence This is obviously new behaviour of h5py introduced some time between 2017 and now. If we recreate the file now, the following line of code:
attrs['vlen_str_array'] = [b'Hello', b'World!']translates to this HDF5 dump:
ATTRIBUTE "vlen_str_array" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): "Hello", "World!"
}
}
whereas the old file translates to:
ATTRIBUTE "vlen_str_array" {
DATATYPE H5T_STRING {
STRSIZE 6;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): "Hello\000", "World!"
}
}
So from my perspective this reveals a big flaw in our testing suite. Although we should be able to read ancient files, we should not expect that this gives the same output with current versions of h5py/libhdf5.
I've managed to create both of the above representations with recent h5py and added the two options for now to the tests.
There was a problem hiding this comment.
And from looking at this, the old version doesn't seem to be variable length at all.
There was a problem hiding this comment.
I've added a fix for #100 as it fits thematically in here. Please let me know, if you are OK with this.
|
woohoo @kmuehlbauer you're an absolute legend, was bashing me brains about #100 on Friday, many thanks! Would you mind if I added an integration test here, with a real (but small) file that replicates that issue? 🍻 |
|
Yes, please go ahead. 🚀 I'm currently working on ironing out some more issues with EnumType and other user types (eg. COMPOUND). |
brill! Should I just PR it to your fork? I still need to get the correct (small) file, I was banking on Iris to chop the original file - which I did, but Iris has fixed that NULL (which is interesting to see it doing that) |
|
@valeriupredoi Just push any relevant changes directly to my PR. No need to fiddle with PR's here. |
|
OK @kmuehlbauer I've been trying to write a file with that NULL DATATSPACE for a wee while now - no success, netCDF4 is actually writing it but adds a point to the data, and if you try pass a Null/None, it complains - do you know how to create one? Got no idea how that file was created to have such a peculiar attribute/null value 🍻 |
valeriupredoi
left a comment
There was a problem hiding this comment.
this fixes #100 very nicely - I tested both locally (POSIX) and remotely (S3), very many thanks @kmuehlbauer 🍻
I've added one attribute |
|
@bnlawrence Kai was very kind with my HDF5-PTSD-ed nerves, and fixed the NULL issue here (and other bits too), tests pass, including the one I added to test with that CORDEX file, but it'd be useful if you also had a minute to have a look over it, and merge, please 🍺 |
bnlawrence
left a comment
There was a problem hiding this comment.
I'm slightly concerned that pyfive will now be doing different things with strings? (The bug fix is fine though!)
tests/test_attr_datatypes.py
Outdated
| def test_string_array_attr_datatypes(): | ||
| with pyfive.File(ATTR_DATATYPES_HDF5_FILE) as hfile: | ||
| assert hfile.attrs['vlen_str_array'][0] == 'Hello' | ||
| assert hfile.attrs['vlen_str_array'][1] == 'World!' | ||
|
|
||
| assert hfile.attrs['vlen_str_array'].dtype == np.dtype('O') | ||
| assert hfile.attrs['vlen_str_array'].dtype.metadata == {'h5py_encoding': 'utf-8'} |
There was a problem hiding this comment.
So, I think this is a different behaviour from h5py.
with h5py.File(ATTR_DATATYPES_HDF5_FILE) as hfile:
...: x=hfile.attrs['vlen_str_array'][0]
...: print(x)
b'Hello'
so I'm thinking this is not what we want?
kmuehlbauer
left a comment
There was a problem hiding this comment.
@bnlawrence I've worked along your suggestions.
|
Should we consider #108 (comment)? |
|
@bnlawrence According to our discussion in #108 I've reverted the changes of the ingested hdf5 file. Instead I only changed the according make file, taking now an argument destination file path. For testing I introduced a fixture, which creates that file on demand and tests the necessary added features in new functions. Is this a way forward? Slight changes to the make scripts and only some changes to the tests. |
|
Great. Thanks. @valeriupredoi do your thing. |
|
Very many thanks @kmuehlbauer and @bnlawrence 🍺🍺 |
Description
This fixes an issue with enum attributes, where instead of the value, the dtype was returned. It also fixes an issue where
Emptyattributes (with NULL dataspace) where not handled properly.Closes #101
Closes #100
While doing this, I had to enhance the tests and according make/data file. Also, I had to fix some inconsistencies in that particular test file wrt numpy.
Before you get started
Checklist