Skip to content

apply_ufunc erroneously operating on an empty array when dask used #3168

Closed
@TomNicholas

Description

@TomNicholas

Problem description

apply_ufunc with dask='parallelized' appears to be trying to act on an empty numpy array when the computation is specified, but before .compute() is called. In other words, a ufunc which just prints the shape of its argument will print (0,0) then print the correct shape once .compute() is called.

Minimum working example

import numpy as np
import xarray as xr


def example_ufunc(x):
    print(x.shape)
    return np.mean(x, axis=-1)

def new_mean(da, dim):
    result = xr.apply_ufunc(example_ufunc, da,
                            input_core_dims=[[dim]], dask='parallelized',
                            output_dtypes=[da.dtype])
    return result


shape = {'t': 2, 'x':3}
data = xr.DataArray(data=np.random.rand(*shape.values()), dims=shape.keys())
unchunked = data
chunked = data.chunk(shape)


actual = new_mean(chunked, dim='x')  # raises the warning
print(actual)

print(actual.compute())  # does the computation correctly

Result

(0, 0)
/home/tnichol/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
<xarray.DataArray (t: 2)>
dask.array<shape=(2,), dtype=float64, chunksize=(2,)>
Dimensions without coordinates: t
(2, 3)
<xarray.DataArray (t: 2)>
array([0.147205, 0.402913])
Dimensions without coordinates: t

Expected result

Same thing without the (0,0) or the numpy warning.

Output of xr.show_versions()

(my xarray is up-to-date with master)

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-862.14.4.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1

xarray: 0.12.3+23.g1d7bcbd
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.0
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 40.6.2
pip: 18.1
conda: None
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions