Skip to content

Why does xr.apply_ufunc support numpy/dask.arrays? #8995

Open
@TomNicholas

Description

@TomNicholas

What is your issue?

@keewis pointed out that it's weird that xarray.apply_ufunc supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.

Two arguments in favour of removing it:

  1. It exposes users to transposition errors

Consider this example:

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: arr = np.arange(12).reshape(3, 4)

In [4]: def mean(obj, dim):
   ...:     # note: apply always moves core dimensions to the end
   ...:     return xr.apply_ufunc(
   ...:         np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1}
   ...:     )
   ...: 

In [5]: mean(arr, dim='time')
Out[5]: array([1.5, 5.5, 9.5])

In [6]: mean(arr.T, dim='time')
Out[6]: array([4., 5., 6., 7.])

Transposing the input leads to a different result, with the value of the dim kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.

  1. There is an alternative input pattern that doesn't require accepting bare arrays

Instead, any numpy/dask array can just be wrapped up into an xarray Variable/NamedArray before passing it to apply_ufunc.

In [7]: from xarray.core.variable import Variable

In [8]: var = Variable(data=arr, dims=['time', 'space'])

In [9]: mean(var, dim='time')
Out[9]: 
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])

In [10]: mean(var.T, dim='time')
Out[10]: 
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])

This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.

With Variable/NamedArray as public API, this latter pattern can handle every case that passing bare arrays in could.

I suggest we deprecate accepting bare arrays in favour of having users wrap them in Variable/NamedArray/DataArray objects instead.

(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)

(Note 2: In a quick scan of the apply_ufunc docstring, the docs on it in computation.rst, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to apply_ufunc.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions