Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist' #3215

Closed
aidanheerdegen opened this issue Aug 14, 2019 · 9 comments · Fixed by #3220
Closed
Labels

Comments

@aidanheerdegen
Copy link
Contributor

MCVE Code Sample

import xarray

file = 'temp_048.nc'

# Works ok with open_dataset
ds = xarray.open_dataset(file, decode_cf=True)
ds = xarray.open_dataset(file, decode_cf=False)
ds = xarray.decode_cf(ds)

# Fails with open_mfdataset
ds = xarray.open_mfdataset(file, decode_cf=True)
ds = xarray.open_mfdataset(file, decode_cf=False)
# This line throws an exception
ds = xarray.decode_cf(ds)

Expected Output

Nothing

Problem Description

When opening data with open_mfdataset calling decode_cf throws an error, when called as a separate step, but works as part of the open_mfdataset call.
Error is:

Traceback (most recent call last):
  File "tmp.py", line 11, in <module>
    ds = xarray.decode_cf(ds)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
    decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
    stack_char_dim=stack_char_dim, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
    var = coder.decode(var, name=name)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
    self.use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
    last_item(values) or [0]])
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
    return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7.1
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 2.2.0
distributed: 2.2.0
matplotlib: 2.2.4
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 5.0.1
IPython: 7.7.0
sphinx: None

There is no error using an older version of numpy with the same xarray version:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 2.2.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 4.6.3
IPython: 7.5.0
sphinx: None

Looks like the tollst() method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call list on np.ravel(array[indexer]) at the same point and it works.

The netcdf file I am using can be recreated from this CDL dump

netcdf temp_048 {
dimensions:
        time = UNLIMITED ; // (5 currently)
        nv = 2 ;
variables:
        double average_T1(time) ;
                average_T1:long_name = "Start time for average period" ;
                average_T1:units = "days since 1958-01-01 00:00:00" ;
                average_T1:missing_value = 1.e+20 ;
                average_T1:_FillValue = 1.e+20 ;
        double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 1958-01-01 00:00:00" ;
                time:cartesian_axis = "T" ;
                time:calendar_type = "GREGORIAN" ;
                time:calendar = "GREGORIAN" ;
                time:bounds = "time_bounds" ;
        double time_bounds(time, nv) ;
                time_bounds:long_name = "time axis boundaries" ;
                time_bounds:units = "days" ;
                time_bounds:missing_value = 1.e+20 ;
                time_bounds:_FillValue = 1.e+20 ;

// global attributes:
                :filename = "ocean.nc" ;
                :title = "MOM5" ;
                :grid_type = "mosaic" ;
                :grid_tile = "1" ;
                :history = "Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc" ;
                :NCO = "netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
data:

 average_T1 = 87659, 88024, 88389, 88754, 89119 ;

 time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ;

 time_bounds =
  87659, 88024,
  88024, 88389,
  88389, 88754,
  88754, 89119,
  89119, 89484 ;
}
@brianpm
Copy link

brianpm commented Aug 14, 2019

I'm having the same issue with xarray 0.12.3, numpy 1.17.0, python 3.7.3.

@DocOtak
Copy link
Contributor

DocOtak commented Aug 14, 2019

I think this is being thrown by dask, here is an even more minimal example:

>>> import dask as da
>>> da.array.from_array([]).tolist()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'tolist'

@aidanheerdegen
Copy link
Contributor Author

Seems that this is by design, from here

Dask Array doesn’t implement operations like tolist that would be very inefficient for larger datasets. Likewise, it is very inefficient to iterate over a Dask array with for loops

So dask has never had a tolist method, so in one case the object is a dask array, but not in the other case.

I still don't understand why it fails when decode_cf is called separately. Suggests there is a different code path as all underlying packages are identical.

@shoyer
Copy link
Member

shoyer commented Aug 15, 2019

This is definitely due to NumPy 1.17 / __array_function__, which means that np.ravel now calls out to a dask function rather than coercing to NumPy.

I think the right fix is to add explicit casting to NumPy, e.g., np.ravel(np.asarray(array[indexer]))

@shoyer
Copy link
Member

shoyer commented Aug 15, 2019

A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.

@aidanheerdegen
Copy link
Contributor Author

Thanks @shoyer.

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

@shoyer
Copy link
Member

shoyer commented Aug 15, 2019

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

In the first case, CF conventions decoding is done on xarray's internal lazy BackendArray objects.

In the second case, CF conventions decoding is done on dask arrays. There are a few cases where this can be slower / require loading more data to look at a small slice of array data.

@aidanheerdegen
Copy link
Contributor Author

Thanks for the explanation.

@aidanheerdegen
Copy link
Contributor Author

aidanheerdegen commented Aug 15, 2019

Confirmed that NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 fixes this issue for me.

Have submitted a PR with your suggested fix #3220

Confirmed the submitted code fixes my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants