Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open dataset with empty list units #6781

Closed
2 of 4 tasks
antscloud opened this issue Jul 13, 2022 · 6 comments
Closed
2 of 4 tasks

Cannot open dataset with empty list units #6781

antscloud opened this issue Jul 13, 2022 · 6 comments
Labels

Comments

@antscloud
Copy link
Contributor

What happened?

I found myself using a netcdf with empty units and by using xarray i was unable to use open_dataset due to the parsing of cf conventions.
I reproduce the bug, and it happens in a particular situation when the units is an empty list (See Minimal Complete Verifiable Example)

What did you expect to happen?

To parse the units attribute as an empty string ?

Minimal Complete Verifiable Example

temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]

# for real use cases, its good practice to supply array attributes such as
# units, but we won't bother here for the sake of brevity
ds = xr.Dataset(
        {
            "temperature": (["x", "y", "time"], temp),
            "precipitation": (["x", "y", "time"], precip),
        },
        coords={
            "lon": (["x", "y"], lon),
            "lat": (["x", "y"], lat),
            "time": pd.date_range("2014-09-06", periods=3),
            "reference_time": pd.Timestamp("2014-09-05"),
        },
    )
ds.temperature.attrs["units"] = []

ds.to_netcdf("test.nc")

ds = xr.open_dataset("test.nc")
ds.close()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 ds = xr.open_dataset("test.nc")
      2 print(ds["temperature"].attrs)
      3 ds.close()

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 decoders = _resolve_decoders_kwargs(
    484     decode_cf,
    485     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    491     decode_coords=decode_coords,
    492 )
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )
    512 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:564, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    562 store_entrypoint = StoreBackendEntrypoint()
    563 with close_on_error(store):
--> 564     ds = store_entrypoint.open_dataset(
    565         store,
    566         mask_and_scale=mask_and_scale,
    567         decode_times=decode_times,
    568         concat_characters=concat_characters,
    569         decode_coords=decode_coords,
    570         drop_variables=drop_variables,
    571         use_cftime=use_cftime,
    572         decode_timedelta=decode_timedelta,
    573     )
    574 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/store.py:27, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     24 vars, attrs = store.load()
     25 encoding = store.get_encoding()
---> 27 vars, attrs, coord_names = conventions.decode_cf_variables(
     28     vars,
     29     attrs,
     30     mask_and_scale=mask_and_scale,
     31     decode_times=decode_times,
     32     concat_characters=concat_characters,
     33     decode_coords=decode_coords,
     34     drop_variables=drop_variables,
     35     use_cftime=use_cftime,
     36     decode_timedelta=decode_timedelta,
     37 )
     39 ds = Dataset(vars, attrs=attrs)
     40 ds = ds.set_coords(coord_names.intersection(vars))

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:503, in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    499     continue
    500 stack_char_dim = (
    501     concat_characters and v.dtype == "S1" and v.ndim > 0 and stackable(v.dims[-1])
    502 )
--> 503 new_vars[k] = decode_cf_variable(
    504     k,
    505     v,
    506     concat_characters=concat_characters,
    507     mask_and_scale=mask_and_scale,
    508     decode_times=decode_times,
    509     stack_char_dim=stack_char_dim,
    510     use_cftime=use_cftime,
    511     decode_timedelta=decode_timedelta,
    512 )
    513 if decode_coords in [True, "coordinates", "all"]:
    514     var_attrs = new_vars[k].attrs

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:354, in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime, decode_timedelta)
    351         var = coder.decode(var, name=name)
    353 if decode_timedelta:
--> 354     var = times.CFTimedeltaCoder().decode(var, name=name)
    355 if decode_times:
    356     var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/coding/times.py:537, in CFTimedeltaCoder.decode(self, variable, name)
    534 def decode(self, variable, name=None):
    535     dims, data, attrs, encoding = unpack_for_decoding(variable)
--> 537     if "units" in attrs and attrs["units"] in TIME_UNITS:
    538         units = pop_to(attrs, encoding, "units")
    539         transform = partial(decode_cf_timedelta, units=units)

TypeError: unhashable type: 'numpy.ndarray'

Anything else we need to know?

The following assignation produces the bug :

ds.temperature.attrs["units"] = []

But these ones does not produce the bug :

ds.temperature.attrs["units"] = "[]"
ds.temperature.attrs["units"] = ""

Also, i don't know how the units attributes get encoded for writing but i see no difference between ds.temperature.attrs["units"] = "" and ds.temperature.attrs["units"] = [] when using ncdump on the file

Environment

This bug was encountered with versions below this one.

INSTALLED VERSIONS

commit: None
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.13.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.6.1

xarray: 0.20.1
pandas: 1.4.3
numpy: 1.22.3
scipy: None
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 61.2.0
pip: 22.1.2
conda: None
pytest: None
IPython: 8.4.0
sphinx: None

@antscloud antscloud added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 13, 2022
@kmuehlbauer
Copy link
Contributor

@antscloud As a workaround you could use keyword argument decode_cf=False in the call to xr.open_dataset. After fixing the units attribute to some reasonable value you can call ds = xr.decode_cf(ds).

@antscloud
Copy link
Contributor Author

@antscloud As a workaround you could use keyword argument decode_cf=False in the call to xr.open_dataset. After fixing the units attribute to some reasonable value you can call ds = xr.decode_cf(ds).

Thank you, i'll do this. One could just loop over variables attributes and replace [] by an empty string in this particular case

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Jul 13, 2022
@dcherian
Copy link
Contributor

I guess we could take a PR to change

if "units" in attrs and attrs["units"] in TIME_UNITS:

to

if "units" in attrs and isinstance(attrs["units"], str) and attrs["units"] in TIME_UNITS:

@antscloud
Copy link
Contributor Author

I was wondering why the units attribute is parsed this way in the first place ?
It seems that this attribute is converted to a Python object (a list), is it xarray that does this or the binding of netcdf4 ?

If it's xarray, wouldn't it be better to just not parse it ?

@dcherian
Copy link
Contributor

It is checking to see if we can decode it as a time variable

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

I think this is now fixed by #7085 (thanks @ghislainp )

@dcherian dcherian closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants