Description
I've been running into some issues when using xr.full_like
, when my other.data
is a chunked dask array, and the fill_value
is a numpy array.
Now, I just checked, full_like
mentions only scalar in the signature. However, this is a very convenient way to get all the coordinates and dimensions attached to an array like this, so it feels like desirable functionality. And as I mention below, both numpy and dask function similary, taking much more than just scalars.
https://xarray.pydata.org/en/stable/generated/xarray.full_like.html
MCVE Code Sample
x = [1, 2, 3, 4]
y = [1, 2, 3]
da1 = xr.DataArray(dask.array.ones((3, 4), chunks=(1, 4)), {"y": y, "x": x}, ("y", "x"))
da2 = xr.full_like(da1, np.ones((3, 4)))
print(da2.values)
This results in an error:
ValueError: could not broadcast input array from shape (1,3) into shape (1,4)
Expected Output
Expected is a DataArray with the dimensions and coords of other
, and the numpy array of fill_value
as its data.
Problem Description
The issue lies here:
Lines 1420 to 1436 in 2c77eb5
Calling dask.array.full
with the given number of chunks results in it trying to to apply the fill_value
for every individual chunk.
As one would expect, if I set fill_value
to the size of a single chunk it doesn't error:
da2 = xr.full_like(da1, np.ones((1, 4)))
print(da2.values)
It does fail on a similarly chunked dask array (since it's applying it for every chunk):
da2 = xr.full_like(da1, dask.array.ones((3, 4)))
print(da2.values)
The most obvious solution would be to force it down the np.full_like
route, since all the values already exist in memory anyway. So maybe another type check does the trick. However, full()
accepts quite a variety of arguments for the fill value (scalars, numpy arrays, lists, tuples, ranges). The dask docs mention only a scalar in the signature for dask.array.full
:
https://docs.dask.org/en/latest/array-api.html#dask.array.full
As does numpy.full:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html
However, in all cases, they still broadcast automatically...
a = np.full((2, 2), [1, 2]
>>> array([[1, 2],
[1, 2]])
So kind of undefined behavior of a blocked full
?
Versions
Output of `xr.show_versions()`
INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3xarray: 0.15.1
pandas: 0.25.3
numpy: 1.17.5
scipy: 1.3.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.9.2
distributed: 2.10.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.13.0
sphinx: 2.3.1