Description
What happened:
Unicode coordinates convert to bytes after saving/loading with h5netcdf
backend. This results with the practically unusable dataset (bytes != string).
What you expected to happen:
Load the string as a string.
Minimal Complete Verifiable Example:
coils = np.array(["A", "B", "C", "D", "E"])
data = np.array([1 + 1j, 1 + 2j, 3j, 4j, 6j])
test_ds = xr.Dataset()
test_ds.coords["coils"] = coils
test_ds["data"] = ("coils", data)
test_ds
> <xarray.Dataset>
> Dimensions: (coils: 5)
> Coordinates:
> * coils (coils) <U1 'A' 'B' 'C' 'D' 'E'
> Data variables:
> data (coils) complex128 (1+1j) (1+2j) 3j 4j 6j
test_ds.to_netcdf("test.nc", engine="h5netcdf")
del test_ds
test_ds = xr.open_dataset("test.nc", engine="h5netcdf")
test_ds
> <xarray.Dataset>
> Dimensions: (coils: 5)
> Coordinates:
> * coils (coils) object b'A' b'B' b'C' b'D' b'E'
> Data variables:
> data (coils) complex128 ...
Anything else we need to know?:
The issue may be related to #1638.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: cs_CZ.UTF-8
LOCALE: cs_CZ.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: 1.5.3
netCDF4: 1.5.5
pydap: None
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: None