Description
What happened?
With a change from xarray version 2022.06.0
to 2022.09.0
the following output is no longer written as float32
but float64
.
What did you expect to happen?
I expected the output to have the same dtype
.
Minimal Complete Verifiable Example
import xarray as xr
ds = xr.tutorial.load_dataset("eraint_uvz")
encoding = {'z':{'zlib':True}
ds.z.to_netcdf("compressed.nc", encoding=encoding)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
# xarray version == 2022.06.0
netcdf compressed {
dimensions:
longitude = 480 ;
latitude = 241 ;
level = 3 ;
month = 2 ;
variables:
float longitude(longitude) ;
longitude:_FillValue = NaNf ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:_FillValue = NaNf ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int level(level) ;
level:units = "millibars" ;
level:long_name = "pressure_level" ;
int month(month) ;
float z(month, level, latitude, longitude) ;
z:_FillValue = NaNf ;
z:number_of_significant_digits = 5 ;
z:units = "m**2 s**-2" ;
z:long_name = "Geopotential" ;
z:standard_name = "geopotential" ;
# xarray version == 2022.09.0
netcdf compressed {
dimensions:
longitude = 480 ;
latitude = 241 ;
level = 3 ;
month = 2 ;
variables:
float longitude(longitude) ;
longitude:_FillValue = NaNf ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:_FillValue = NaNf ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int level(level) ;
level:units = "millibars" ;
level:long_name = "pressure_level" ;
int month(month) ;
double z(month, level, latitude, longitude) ;
z:_FillValue = NaN ;
z:number_of_significant_digits = 5 ;
z:units = "m**2 s**-2" ;
z:long_name = "Geopotential" ;
z:standard_name = "geopotential" ;
Anything else we need to know?
In addition to the change of dtype
from float
to double
, I wonder if both outputs should actually rather be int16
, because this is the dtype
of the original dataset:
>>> import xarray as xr
>>> ds = xr.tutorial.load_dataset("eraint_uvz")
>>> ds.z.encoding
{'source': '.../.cache/xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc', 'original_shape': (2, 3, 241, 480), 'dtype': dtype('int16'), '_FillValue': nan, 'scale_factor': -1.7250274674967954, 'add_offset': 66825.5}
>>> ds.z.to_netcdf("original.nc")
netcdf original {
dimensions:
longitude = 480 ;
latitude = 241 ;
level = 3 ;
month = 2 ;
variables:
float longitude(longitude) ;
longitude:_FillValue = NaNf ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:_FillValue = NaNf ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int level(level) ;
level:units = "millibars" ;
level:long_name = "pressure_level" ;
int month(month) ;
short z(month, level, latitude, longitude) ;
z:_FillValue = 0s ;
z:number_of_significant_digits = 5 ;
z:units = "m**2 s**-2" ;
z:long_name = "Geopotential" ;
z:standard_name = "geopotential" ;
z:add_offset = 66825.5 ;
z:scale_factor = -1.7250274674968 ;
Sorry for mixing an issue with a question, but why is the add_offset
and scale_factor
applied and the values saved as float32/float64
in case encoding
is set? I guess encoding
in to_netcdf
is overwriting the initial encoding, because
ds.z.to_netcdf("test_w_offset.nc", encoding={"z":{"add_offset":66825.5, "scale_factor":-1.7250274674968, "dtype":'int16'}})
produces the expected output that matches the original one. So I imagine, a good way of setting the output encoding is currently something like
ds.to_netcdf("compressed.nc", encoding={v:{**ds.v.encoding, "zlib":True} for v in ds.data_vars})
in case an encoding similar to the input encoding - with additional parameters (e.g. 'zlib') - is requested.
Environment
xarray: 2022.6.0. # or 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: None
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.3.0
sphinx: None