-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netCDF encoding and decoding issues. #8957
Comments
@Thomas-Z Thanks for the well written issue. The first issue is with Timedelta decoding. If you remove the The second issue is non-conforming CF attribute. |
We could cast and raise a warning. It should be OK to open a non-conforming file with xarray. |
Not sure if it helps but keeping the unit and removing the fill_value makes it work too.
Right, I was not aware of that.
In my example I can open the non-conforming file. |
Yes, I would have thought so. The CF mask coder is only applied when
So, for the second case we already allow to read int64 packed into int8 (which is not CF conforming). But then it might be good to raise a more specific error on write, here (non conforming CF). |
My problem is more about the fact that we can no longer read these type of variables without setting Decoding it as a timedelta64 with the option to disable it with Simple rules (when possible) will not satisfy everyone but we will not have any surprise and we can adapt. |
I'm having similar issues, but with reading a preexisting data file from Metop-C's ASCAT instrument. Maybe these files are non-conforming (I'm not sure) but the are official files from EUMETSAT. Unless I'm misunderstanding something, though, the file appears to follow the rules regarding packed data linked by @Thomas-Z. The data are packed as an Opening the file with > df.time.values
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
Cell In[35], line 1
----> 1 dat.time.values
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/dataarray.py:784, in DataArray.values(self)
771 @property
772 def values(self) -> np.ndarray:
773 """
774 The array's data converted to numpy.ndarray.
775
(...)
782 to this array may be reflected in the DataArray as well.
783 """
--> 784 return self.variable.values
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:525, in Variable.values(self)
522 @property
523 def values(self):
524 """The variable's data as a numpy.ndarray"""
--> 525 return _as_array_or_item(self._data)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:323, in _as_array_or_item(data)
309 def _as_array_or_item(data):
310 """Return the given values as a numpy array, or as an individual item if
311 it's a 0d datetime64 or timedelta64 array.
312
(...)
321 TODO: remove this (replace with np.asarray) once these issues are fixed
322 """
--> 323 data = np.asarray(data)
324 if data.ndim == 0:
325 if data.dtype.kind == "M":
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:806, in MemoryCachedArray.__array__(self, dtype)
805 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
--> 806 return np.asarray(self.get_duck_array(), dtype=dtype)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self)
808 def get_duck_array(self):
--> 809 self._ensure_cached()
810 return self.array.get_duck_array()
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self)
802 def _ensure_cached(self):
--> 803 self.array = as_indexable(self.array.get_duck_array())
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self)
759 def get_duck_array(self):
--> 760 return self.array.get_duck_array()
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:630, in LazilyIndexedArray.get_duck_array(self)
625 # self.array[self.key] is now a numpy array when
626 # self.array is a BackendArray subclass
627 # and self.key is BasicIndexer((slice(None, None, None),))
628 # so we need the explicit check for ExplicitlyIndexed
629 if isinstance(array, ExplicitlyIndexed):
--> 630 array = array.get_duck_array()
631 return _wrap_numpy_scalars(array)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self)
80 def get_duck_array(self):
---> 81 return self.func(self.array.get_duck_array())
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:399, in _scale_offset_decoding(data, scale_factor, add_offset, dtype)
397 data = data.astype(dtype=dtype, copy=True)
398 if scale_factor is not None:
--> 399 data *= scale_factor
400 if add_offset is not None:
401 data += add_offset
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' I can get around this by opening the file with The <xarray.DataArray 'time' (NUMROWS: 3264, NUMCELLS: 82)> Size: 2MB
[267648 values with dtype=int64]
Coordinates:
lat (NUMROWS, NUMCELLS) float64 2MB ...
lon (NUMROWS, NUMCELLS) float64 2MB ...
Dimensions without coordinates: NUMROWS, NUMCELLS
Attributes:
valid_min: 0
valid_max: 2147483647
standard_name: time
long_name: time
units: seconds since 1990-01-01 00:00:00 When read with {'_FillValue': -2147483647,
'missing_value': -2147483647,
'valid_min': 0,
'valid_max': 2147483647,
'standard_name': 'time',
'long_name': 'time',
'scale_factor': 1.0,
'add_offset': 0.0} I can replicate the error by attempting to do an in-place operation on some of the time data after reading with In [54]: df = xr.open_dataset(fname, mask_and_scale=False, decode_times=False)
In [55]: tmp = df.time.values[0:10, 0:10]
In [56]: tmp.dtype
Out[56]: dtype('int32')
In [57]: df.time.attrs['scale_factor'].dtype
Out[57]: dtype('float64')
In [58]: tmp *= dat2.time.attrs.get('scale_factor')
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
Cell In[58], line 1
----> 1 tmp *= dat2.time.attrs.get('scale_factor')
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int32') with casting rule 'same_kind' Am I doing something wrong? Is the file non-conformant? Is there a way to solve this issue without doing all of my own masking, scaling, and conversion to datetime? |
@jsolbrig Sorry for the delay here. The issue is with the on-disk data 'scale_factor': 1.0,
'add_offset': 0.0 This will decode the int64 into float64 before decoding times. One solution to properly load your specific data is to remove the problematic attributes before decoding: df = xr.open_dataset(fname, decode_cf=False)
df.time.attrs.pop("scale_factor")
df.time.attrs.pop("add_offset")
df = xr.decode_cf(df) |
What happened?
Reading or writing netCDF variables containing scale_factor and/or fill_value might raise the following error:
This problem might be related to the following changes: #7654.
What did you expect to happen?
I'm expecting it to work like it did before xarray 2024.03.0!
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-92-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.3.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
Nio: None
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: 24.3.0
pytest: 8.1.1
mypy: 1.9.0
IPython: 8.22.2
sphinx: 7.3.5
The text was updated successfully, but these errors were encountered: