Skip to content

Multidimensional dask coordinates unexpectedly computed #3068

Closed
@djhoese

Description

@djhoese

MCVE Code Sample

from dask.diagnostics import ProgressBar
import xarray as xr
import numpy as np
import dask.array as da

a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), da.zeros((10, 10), chunks=2))}) 
b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), da.zeros((10, 10), chunks=2))}) 

with ProgressBar():
    c = a + b

Output:

[########################################] | 100% Completed |  0.1s

Problem Description

Using arrays with 2D dask array coordinates results in the coordinates being computed for any binary operations (anything combining two or more DataArrays). I use ProgressBar in the above example to show when coordinates are being computed.

In my own work, when I learned that 2D dask coordinates were possible, I started adding longitude and latitude coordinates. These are rather large and can take a while to load/compute so I was surprised that simple operations (ex. a.fillna(b)) were causing things to be computed and taking a long time.

Is this computation by design or a possible bug?

Expected Output

No output from the ProgressBar, hoping that no coordinates would be computed/loaded.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 02:16:08)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.14.3
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.0.0
distributed: 2.0.0
matplotlib: 3.1.0
cartopy: 0.17.1.dev147+HEAD.detached.at.5e624fe
seaborn: None
setuptools: 41.0.1
pip: 19.1.1
conda: None
pytest: 4.6.3
IPython: 7.5.0
sphinx: 2.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions