Skip to content

Decoding non-utf-8 encoded strings with the h5netcdf engine #5563

Closed
@kiksekage

Description

@kiksekage

What happened:
Trying to load a netCDF file-like (io.BytesIO object) with attribute strings in non-utf-8 encoding with the h5netcdf engine leads to UnicodeDecodeError.

What you expected to happen:
Loading the same file, albeit persisted to disk, with the netcdf4 engine works fine, however, since the netcdf4 engine doesnt support the file-like objects I ran into this issue.

Traceback:
Traceback (most recent call last):
File "", line 1, in
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 242, in load_dataset
with open_dataset(filename_or_obj, **kwargs) as ds:
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 496, in open_dataset
backend_ds = backend.open_dataset(
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 384, in open_dataset
ds = store_entrypoint.open_dataset(
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py", line 22, in open_dataset
vars, attrs = store.load()
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py", line 126, in load
attributes = FrozenDict(self.get_attrs())
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 234, in get_attrs
return FrozenDict(read_attributes(self.ds))
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf
.py", line 75, in read_attributes
v = maybe_decode_bytes(v)
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf
.py", line 63, in maybe_decode_bytes
return txt.decode("utf-8")

Minimal Complete Verifiable Example:

import xarray as xr
import netCDF4

title = b'\xc3'

f = netCDF4.Dataset('test.nc', 'w')
f.title = title
f.close()
xr.load_dataset("test.nc", engine="h5netcdf")

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-136-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.1
pandas: 1.2.4
numpy: 1.20.3
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 57.0.0
pip: 21.1.3
conda: None
pytest: 6.2.4
IPython: 7.25.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions