Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression/#1840: decoding to float64 instead of float32 #9041

Closed
4 of 5 tasks
yt87 opened this issue May 22, 2024 · 8 comments · Fixed by #9045
Closed
4 of 5 tasks

Regression/#1840: decoding to float64 instead of float32 #9041

yt87 opened this issue May 22, 2024 · 8 comments · Fixed by #9045

Comments

@yt87
Copy link

yt87 commented May 22, 2024

What happened?

xr.open_dataset unnecessarily promotes small integers to float64. This was an issue #1840 (fixed) back in 2018. If I am not mistaken, it was reversed with release 2024.3.0.

What did you expect to happen?

I expected float32 in the output below.

Minimal Complete Verifiable Example

from pathlib import Path

import numpy as np
import xarray as xr

v = np.ones(5, dtype=np.float32) * 0.99
v[0] = np.nan
da = xr.DataArray(v, name='foo')
print(da)
file = Path('/tmp/foo.nc')
file.unlink(missing_ok=True)
encoding = {'foo': {'_FillValue': 255, 'dtype': 'uint8', 'scale_factor': 0.01}}
da.to_netcdf(file, encoding=encoding)
ds = xr.open_dataset(file)
print(ds['foo'])

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

<xarray.DataArray 'foo' (dim_0: 5)> Size: 20B
array([ nan, 0.99, 0.99, 0.99, 0.99], dtype=float32)
Dimensions without coordinates: dim_0
<xarray.DataArray 'foo' (dim_0: 5)> Size: 40B
[5 values with dtype=float64]
Dimensions without coordinates: dim_0

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.9.0-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_GB
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.3.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: 1.3.8
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.22.2
sphinx: None
@yt87 yt87 added bug needs triage Issue that has not been reviewed by xarray team member labels May 22, 2024
@yt87
Copy link
Author

yt87 commented May 22, 2024

The relevant change was made to _choose_flooat_dtype in xarray/coding/variables.py. The variable's dtype is determined by dtype of scale_factor. Changing
encoding = {'foo': {'_FillValue': 255, 'dtype': 'uint8', 'scale_factor': 0.01}}
to
encoding = {'foo': {'_FillValue': 255, 'dtype': 'uint8', 'scale_factor': np.float32(0.01)}}
is the required fix.
Should this be mentioned somewhere in the docs?

@dcherian dcherian changed the title Regression on issue #1840? Regression/#1840: decoding to float64 instead of float32 May 22, 2024
@kmuehlbauer
Copy link
Contributor

Thanks @yt87, yes this is correct. Xarray is following CF Conventions as declared in https://docs.xarray.dev/en/stable/user-guide/io.html#reading-encoded-data.

Here the CF Conventions for packed data apply https://cfconventions.org/cf-conventions/cf-conventions.html#packed-data:

When packed data is written, the scale_factor and add_offset attributes must be of the same type as the unpacked data, which must be either float or double. Data of type float must be packed into one of these types: byte, unsigned byte, short, unsigned short. Data of type double must be packed into one of these types: byte, unsigned byte, short, unsigned short, int, unsigned int.

Would it make sense to add a sentence in the above linked doc section about handling packed data which links to the CF Conventions on packed data?

@kmuehlbauer kmuehlbauer added topic-CF conventions and removed bug needs triage Issue that has not been reviewed by xarray team member labels May 23, 2024
@yt87
Copy link
Author

yt87 commented May 23, 2024

It would be helpful to have a link. The CF conventions document is quite long.

@kmuehlbauer
Copy link
Contributor

OK, my above comment referenced the reading section only. There is also a writing section, which also has some mention of packed data handling:

https://docs.xarray.dev/en/stable/user-guide/io.html#scaling-and-type-conversions

I think we could just add a reference to the CF Conventions on packed data somewhere in that section.

@kmuehlbauer
Copy link
Contributor

@yt87 Do you think the additional link to CF Conventions will help to better understand behaviour?

https://xray--9045.org.readthedocs.build/en/9045/user-guide/io.html#scaling-and-type-conversions

@yt87
Copy link
Author

yt87 commented May 24, 2024

I would add a note that the scale_factor and add_offset determine variable type upon reading. That's what tripped me. I know now that is in the provided link, but the heuristics used in the previous releases of xarray produced different results.

@dcherian
Copy link
Contributor

There's some link to #8946

cc @keewis

@kmuehlbauer
Copy link
Contributor

@yt87 I've added an explanatory sentence: https://xray--9045.org.readthedocs.build/en/9045/user-guide/io.html#scaling-and-type-conversions per your suggestion.

I've marked the PR as plan-to-merge. So this might be in for the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants