Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisting datasets containing arrays with a single chunk fails #4882

Closed
sjperkins opened this issue Feb 8, 2021 · 2 comments · Fixed by #4884
Closed

Persisting datasets containing arrays with a single chunk fails #4882

sjperkins opened this issue Feb 8, 2021 · 2 comments · Fixed by #4884

Comments

@sjperkins
Copy link

What happened:

Persisting a dataset containing dask arrays with a single chunk fails.

What you expected to happen:

This should succeed!

Minimal Complete Verifiable Example:

import dask
import dask.array as da
import xarray as xr

data_vars = {
    "DATA": (("x", "y"), da.ones((1000, 1000), chunks=(1000, 1000))),
    "TIME": (("x",), da.ones((1000,), chunks=1000)),
}

dask.persist(xr.Dataset(data_vars))
 python ~/tmp/python/dask/test_xarray_persist_fail.py 
Traceback (most recent call last):
  File "/home/sperkins/tmp/python/dask/test_xarray_persist_fail.py", line 10, in <module>
    dask.persist(xr.Dataset(data_vars))
  File "/home/sperkins/venv/dask-ms/lib/python3.6/site-packages/dask/base.py", line 770, in persist
    results2 = [r({k: d[k] for k in ks}, *s) for r, ks, s in postpersists]
  File "/home/sperkins/venv/dask-ms/lib/python3.6/site-packages/dask/base.py", line 770, in <listcomp>
    results2 = [r({k: d[k] for k in ks}, *s) for r, ks, s in postpersists]
  File "/home/sperkins/venv/dask-ms/lib/python3.6/site-packages/xarray/core/dataset.py", line 877, in _dask_postpersist
    name = args2[1][0]
IndexError: tuple index out of range

Anything else we need to know?:

This occurred in the new dask 2021.02.0 and the immediate cause is probably dask/dask#7142, where the collection key name is no longer passed as the first argument by default. It's likely xarray was relying on convention here. This may be solvable by explicitly passing the dask collection name as an extra argument to the *_persist, *_compute methods so that keys unrelated to the collection can be excised appropriately.

/cc @crusaderky @JSKenyon

Environment:

Ubuntu 18.04
dask: 2021.02.0

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
python-bits: 64
OS: Linux
OS-release: 5.4.0-64-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: None
libnetcdf: None

xarray: 0.16.2
pandas: 0.25.0
numpy: 1.19.5
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.02.0
distributed: 2.8.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 41.2.0
pip: 20.1
conda: None
pytest: 6.1.0
IPython: 7.11.1
sphinx: 1.8.5

@keewis
Copy link
Collaborator

keewis commented Feb 8, 2021

thanks for the report, @sjperkins. This might be a duplicate of the automatically opened #4860, though.

@sjperkins
Copy link
Author

Thanks @keewis. Closing.

@keewis keewis linked a pull request Feb 11, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants