Description
Problem description
apply_ufunc
with dask='parallelized'
appears to be trying to act on an empty numpy array when the computation is specified, but before .compute()
is called. In other words, a ufunc which just prints the shape of its argument will print (0,0)
then print the correct shape once .compute()
is called.
Minimum working example
import numpy as np
import xarray as xr
def example_ufunc(x):
print(x.shape)
return np.mean(x, axis=-1)
def new_mean(da, dim):
result = xr.apply_ufunc(example_ufunc, da,
input_core_dims=[[dim]], dask='parallelized',
output_dtypes=[da.dtype])
return result
shape = {'t': 2, 'x':3}
data = xr.DataArray(data=np.random.rand(*shape.values()), dims=shape.keys())
unchunked = data
chunked = data.chunk(shape)
actual = new_mean(chunked, dim='x') # raises the warning
print(actual)
print(actual.compute()) # does the computation correctly
Result
(0, 0)
/home/tnichol/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
<xarray.DataArray (t: 2)>
dask.array<shape=(2,), dtype=float64, chunksize=(2,)>
Dimensions without coordinates: t
(2, 3)
<xarray.DataArray (t: 2)>
array([0.147205, 0.402913])
Dimensions without coordinates: t
Expected result
Same thing without the (0,0)
or the numpy warning.
Output of xr.show_versions()
(my xarray is up-to-date with master)
xarray: 0.12.3+23.g1d7bcbd
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.0
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 40.6.2
pip: 18.1
conda: None
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2