Skip to content

Unicode strings unexpectedly transformed to byte strings upon open_dataset #4859

Closed
@kripnerl

Description

@kripnerl

What happened:

Unicode coordinates convert to bytes after saving/loading with h5netcdf backend. This results with the practically unusable dataset (bytes != string).

What you expected to happen:

Load the string as a string.

Minimal Complete Verifiable Example:

coils = np.array(["A", "B", "C", "D", "E"])
data = np.array([1 + 1j, 1 + 2j, 3j, 4j, 6j])

test_ds = xr.Dataset()
test_ds.coords["coils"] = coils
test_ds["data"] = ("coils", data)
test_ds

> <xarray.Dataset>
> Dimensions:  (coils: 5)
> Coordinates:
>  * coils    (coils) <U1 'A' 'B' 'C' 'D' 'E'
> Data variables:
>    data     (coils) complex128 (1+1j) (1+2j) 3j 4j 6j



test_ds.to_netcdf("test.nc", engine="h5netcdf")
del test_ds
test_ds = xr.open_dataset("test.nc", engine="h5netcdf")
test_ds

> <xarray.Dataset>
> Dimensions:  (coils: 5)
> Coordinates:
>   * coils    (coils) object b'A' b'B' b'C' b'D' b'E'
> Data variables:
>     data     (coils) complex128 ...

Anything else we need to know?:

The issue may be related to #1638.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: cs_CZ.UTF-8
LOCALE: cs_CZ.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: 1.5.3
netCDF4: 1.5.5
pydap: None
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions