Skip to content

xr.full_like (often) fails when other is chunked and fill_value is non-scalar #3977

Closed
@Huite

Description

@Huite

I've been running into some issues when using xr.full_like, when my other.data is a chunked dask array, and the fill_value is a numpy array.

Now, I just checked, full_like mentions only scalar in the signature. However, this is a very convenient way to get all the coordinates and dimensions attached to an array like this, so it feels like desirable functionality. And as I mention below, both numpy and dask function similary, taking much more than just scalars.
https://xarray.pydata.org/en/stable/generated/xarray.full_like.html

MCVE Code Sample

x = [1, 2, 3, 4]
y = [1, 2, 3]
da1 = xr.DataArray(dask.array.ones((3, 4), chunks=(1, 4)), {"y": y, "x": x}, ("y", "x"))
da2 = xr.full_like(da1, np.ones((3, 4)))
print(da2.values)

This results in an error:
ValueError: could not broadcast input array from shape (1,3) into shape (1,4)

Expected Output

Expected is a DataArray with the dimensions and coords of other, and the numpy array of fill_value as its data.

Problem Description

The issue lies here:

xarray/xarray/core/common.py

Lines 1420 to 1436 in 2c77eb5

def _full_like_variable(other, fill_value, dtype: DTypeLike = None):
"""Inner function of full_like, where other must be a variable
"""
from .variable import Variable
if isinstance(other.data, dask_array_type):
import dask.array
if dtype is None:
dtype = other.dtype
data = dask.array.full(
other.shape, fill_value, dtype=dtype, chunks=other.data.chunks
)
else:
data = np.full_like(other, fill_value, dtype=dtype)
return Variable(dims=other.dims, data=data, attrs=other.attrs)

Calling dask.array.full with the given number of chunks results in it trying to to apply the fill_value for every individual chunk.

As one would expect, if I set fill_value to the size of a single chunk it doesn't error:

da2 = xr.full_like(da1, np.ones((1, 4)))
print(da2.values)

It does fail on a similarly chunked dask array (since it's applying it for every chunk):

da2 = xr.full_like(da1, dask.array.ones((3, 4)))
print(da2.values)

The most obvious solution would be to force it down the np.full_like route, since all the values already exist in memory anyway. So maybe another type check does the trick. However, full() accepts quite a variety of arguments for the fill value (scalars, numpy arrays, lists, tuples, ranges). The dask docs mention only a scalar in the signature for dask.array.full:
https://docs.dask.org/en/latest/array-api.html#dask.array.full
As does numpy.full:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html

However, in all cases, they still broadcast automatically...

a = np.full((2, 2), [1, 2]
>>> array([[1, 2],
       [1, 2]])

So kind of undefined behavior of a blocked full?

Versions

Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3

xarray: 0.15.1
pandas: 0.25.3
numpy: 1.17.5
scipy: 1.3.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.9.2
distributed: 2.10.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.13.0
sphinx: 2.3.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions