Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte attr support #9407

Merged
merged 1 commit into from
Sep 2, 2024
Merged

Byte attr support #9407

merged 1 commit into from
Sep 2, 2024

Conversation

hollymandel
Copy link
Contributor

@max-sixty
Copy link
Collaborator

Thank you!

Could we add a test?

@TomNicholas TomNicholas added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Aug 28, 2024
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
@hollymandel hollymandel force-pushed the byte_attr_support branch 6 times, most recently from 3739955 to e2e34d5 Compare August 30, 2024 16:27
@hollymandel
Copy link
Contributor Author

I've made the requested changes + cleaned up the commit history

@max-sixty
Copy link
Collaborator

Thanks @hollymandel !

I'll merge shortly. Please feel free to add a brief note to whatsnew...

@max-sixty max-sixty added the plan to merge Final call for comments label Aug 30, 2024
Comment on lines 1407 to 1413
def test_byte_attrs(self) -> None:
with create_tmp_file() as tmp_file:
try:
null_byte = b"\x00"
other_bytes = bytes(range(1, 256))
ds = Dataset({"x": 1}, coords={"x_coord": [1]})
ds["x"].attrs["null_byte"] = null_byte
ds["x"].attrs["other_bytes"] = other_bytes
self.save(ds, tmp_file)
except ValueError:
assert self.engine == "h5netcdf"
else:
with self.open(tmp_file) as ds_out:
assert ds_out["x"].attrs["null_byte"] == ""
assert ds_out["x"].attrs["other_bytes"] == other_bytes.decode(
errors="replace"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to clean this up would be to rewrite this to

    def test_byte_attrs(self, byte_attrs_dataset) -> None:
        # TODO: consider using self.roundtrip
        with create_tmp_file() as tmp_file:
            self.save(ds, tmp_file)
            with self.open(tmp_file) as ds_out:
                assert ds_out["x"].attrs["null_byte"] == ""
                assert ds_out["x"].attrs["other_bytes"] == other_bytes.decode(
                    errors="replace"
                )

and in the h5netcdf tests we could override

def test_byte_attrs(self, byte_attrs_dataset):
    with pytest.raises(ValueError, ADD_REGEX_HERE):
        super().test_byte_attrs(dataset)

and then elsewhere (conftest.py) create the dataset as a fixture

@pytest.fixture
def byte_attrs_dataset():
null_byte = b"\x00"
    other_bytes = bytes(range(1, 256))
    ds = Dataset({"x": 1}, coords={"x_coord": [1]})
    ds["x"].attrs["null_byte"] = null_byte
    ds["x"].attrs["other_bytes"] = other_bytes
	return ds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I've made these changes! Really appreciate the feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Thanks!

Comment on lines +142 to +159
@pytest.fixture
def byte_attrs_dataset():
"""For testing issue #9407"""
null_byte = b"\x00"
other_bytes = bytes(range(1, 256))
ds = Dataset({"x": 1}, coords={"x_coord": [1]})
ds["x"].attrs["null_byte"] = null_byte
ds["x"].attrs["other_bytes"] = other_bytes

expected = ds.copy()
expected["x"].attrs["null_byte"] = ""
expected["x"].attrs["other_bytes"] = other_bytes.decode(errors="replace")

return {
"input": ds,
"expected": expected,
"h5netcdf_error": r"Invalid value provided for attribute .*: .*\. Null characters .*",
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@max-sixty max-sixty merged commit 5fcf82c into pydata:main Sep 2, 2024
28 checks passed
hollymandel added a commit to hollymandel/xarray that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

netCDF4: support byte strings as attribute values
4 participants