Description
What is your issue?
@keewis pointed out that it's weird that xarray.apply_ufunc
supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.
Two arguments in favour of removing it:
- It exposes users to transposition errors
Consider this example:
In [1]: import xarray as xr
In [2]: import numpy as np
In [3]: arr = np.arange(12).reshape(3, 4)
In [4]: def mean(obj, dim):
...: # note: apply always moves core dimensions to the end
...: return xr.apply_ufunc(
...: np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1}
...: )
...:
In [5]: mean(arr, dim='time')
Out[5]: array([1.5, 5.5, 9.5])
In [6]: mean(arr.T, dim='time')
Out[6]: array([4., 5., 6., 7.])
Transposing the input leads to a different result, with the value of the dim
kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.
- There is an alternative input pattern that doesn't require accepting bare arrays
Instead, any numpy/dask array can just be wrapped up into an xarray Variable
/NamedArray
before passing it to apply_ufunc
.
In [7]: from xarray.core.variable import Variable
In [8]: var = Variable(data=arr, dims=['time', 'space'])
In [9]: mean(var, dim='time')
Out[9]:
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])
In [10]: mean(var.T, dim='time')
Out[10]:
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])
This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.
With Variable
/NamedArray
as public API, this latter pattern can handle every case that passing bare arrays in could.
I suggest we deprecate accepting bare arrays in favour of having users wrap them in Variable
/NamedArray
/DataArray
objects instead.
(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)
(Note 2: In a quick scan of the apply_ufunc
docstring, the docs on it in computation.rst
, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to apply_ufunc
.)