-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding non-utf-8 encoded strings with the h5netcdf engine #5563
Comments
Automatic decoding of bytes was implemented in #477 to properly decode returned bytes for CF decoding. In the case of non-utf-8 this brakes as shown.
The question is what should be returned in the non-standard case if the attribute contains non-utf-8 encoded bytes? We could catch the Why are those attributes in non-utf-8 encoding? Legacy data? |
Revisiting this now. Is there any way forward here, or should we close as wont fix? |
Can we raise a warning and leave them encoded? |
Should work, I can have a look. |
What happened:
Trying to load a netCDF file-like (
io.BytesIO
object) with attribute strings in non-utf-8 encoding with theh5netcdf
engine leads toUnicodeDecodeError
.What you expected to happen:
Loading the same file, albeit persisted to disk, with the
netcdf4
engine works fine, however, since thenetcdf4
engine doesnt support the file-like objects I ran into this issue.Traceback:
Traceback (most recent call last):
File "", line 1, in
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 242, in load_dataset
with open_dataset(filename_or_obj, **kwargs) as ds:
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 496, in open_dataset
backend_ds = backend.open_dataset(
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 384, in open_dataset
ds = store_entrypoint.open_dataset(
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py", line 22, in open_dataset
vars, attrs = store.load()
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py", line 126, in load
attributes = FrozenDict(self.get_attrs())
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 234, in get_attrs
return FrozenDict(read_attributes(self.ds))
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 75, in read_attributes
v = maybe_decode_bytes(v)
File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 63, in maybe_decode_bytes
return txt.decode("utf-8")
Minimal Complete Verifiable Example:
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-136-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.18.1
pandas: 1.2.4
numpy: 1.20.3
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 57.0.0
pip: 21.1.3
conda: None
pytest: 6.2.4
IPython: 7.25.0
sphinx: None
The text was updated successfully, but these errors were encountered: