-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte attr support #9407
Byte attr support #9407
Conversation
hollymandel
commented
Aug 28, 2024
- Closes netCDF4: support byte strings as attribute values #7186
- Adds support for bytes as dataset attributes
- Because h5netcdf does not support arbitrary binaries as attributes, adds a check that the binary is an encoded utf8 free of null characters if this is the engine
Thank you! Could we add a test? |
3739955
to
e2e34d5
Compare
I've made the requested changes + cleaned up the commit history |
Thanks @hollymandel ! I'll merge shortly. Please feel free to add a brief note to whatsnew... |
xarray/tests/test_backends.py
Outdated
def test_byte_attrs(self) -> None: | ||
with create_tmp_file() as tmp_file: | ||
try: | ||
null_byte = b"\x00" | ||
other_bytes = bytes(range(1, 256)) | ||
ds = Dataset({"x": 1}, coords={"x_coord": [1]}) | ||
ds["x"].attrs["null_byte"] = null_byte | ||
ds["x"].attrs["other_bytes"] = other_bytes | ||
self.save(ds, tmp_file) | ||
except ValueError: | ||
assert self.engine == "h5netcdf" | ||
else: | ||
with self.open(tmp_file) as ds_out: | ||
assert ds_out["x"].attrs["null_byte"] == "" | ||
assert ds_out["x"].attrs["other_bytes"] == other_bytes.decode( | ||
errors="replace" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to clean this up would be to rewrite this to
def test_byte_attrs(self, byte_attrs_dataset) -> None:
# TODO: consider using self.roundtrip
with create_tmp_file() as tmp_file:
self.save(ds, tmp_file)
with self.open(tmp_file) as ds_out:
assert ds_out["x"].attrs["null_byte"] == ""
assert ds_out["x"].attrs["other_bytes"] == other_bytes.decode(
errors="replace"
)
and in the h5netcdf tests we could override
def test_byte_attrs(self, byte_attrs_dataset):
with pytest.raises(ValueError, ADD_REGEX_HERE):
super().test_byte_attrs(dataset)
and then elsewhere (conftest.py
) create the dataset as a fixture
@pytest.fixture
def byte_attrs_dataset():
null_byte = b"\x00"
other_bytes = bytes(range(1, 256))
ds = Dataset({"x": 1}, coords={"x_coord": [1]})
ds["x"].attrs["null_byte"] = null_byte
ds["x"].attrs["other_bytes"] = other_bytes
return ds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I've made these changes! Really appreciate the feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Thanks!
9ec6fb9
to
658c053
Compare
9eddba5
to
c2472ef
Compare
@pytest.fixture | ||
def byte_attrs_dataset(): | ||
"""For testing issue #9407""" | ||
null_byte = b"\x00" | ||
other_bytes = bytes(range(1, 256)) | ||
ds = Dataset({"x": 1}, coords={"x_coord": [1]}) | ||
ds["x"].attrs["null_byte"] = null_byte | ||
ds["x"].attrs["other_bytes"] = other_bytes | ||
|
||
expected = ds.copy() | ||
expected["x"].attrs["null_byte"] = "" | ||
expected["x"].attrs["other_bytes"] = other_bytes.decode(errors="replace") | ||
|
||
return { | ||
"input": ds, | ||
"expected": expected, | ||
"h5netcdf_error": r"Invalid value provided for attribute .*: .*\. Null characters .*", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!