We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving a dataset with an empty object (string) variable to zarr changes the dtype to float64.
The dtype should remain as O, just as it does for non-empty object variables.
O
import numpy as np import xarray as xr ds = xr.Dataset({"a": np.array([], dtype="O")}) ds["a"].dtype # prints: dtype('O') ds.to_zarr("a.zarr") ds = xr.open_dataset("a.zarr") ds["a"].dtype # prints: dtype('float64')
No response
commit: None python: 3.8.13 (default, Mar 28 2022, 06:16:26) [Clang 12.0.0 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: None libnetcdf: None
xarray: 2022.11.0 pandas: 1.5.0 numpy: 1.23.4 scipy: 1.9.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.10.0 distributed: 2022.10.0 matplotlib: 3.6.1 cartopy: None seaborn: 0.12.0 numbagg: None fsspec: 2022.8.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.1 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: 4.2.0
The text was updated successfully, but these errors were encountered:
This behaviour stems from this part of _infer_dtype where empty object arrays are converted to float arrays:
_infer_dtype
xarray/xarray/conventions.py
Lines 156 to 157 in 3aa75c8
Is there any reason we couldn't return strings.create_vlen_dtype(str) instead?
return strings.create_vlen_dtype(str)
Sorry, something went wrong.
vcf_to_zarr
@tomwhite Sorry for the delay here. I'll respond shortly on your PR #7862, but we might have to reiterate here later
open_dataset
chunks="auto"
NC_STRING
Successfully merging a pull request may close this issue.
What happened?
Saving a dataset with an empty object (string) variable to zarr changes the dtype to float64.
What did you expect to happen?
The dtype should remain as
O
, just as it does for non-empty object variables.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.13 (default, Mar 28 2022, 06:16:26)
[Clang 12.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.11.0
pandas: 1.5.0
numpy: 1.23.4
scipy: 1.9.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.10.0
distributed: 2022.10.0
matplotlib: 3.6.1
cartopy: None
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.4.1
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: 4.2.0
The text was updated successfully, but these errors were encountered: