Description
What is your issue?
@paulina-t (who found a bug caused by the behavior we report here in a codebase, where it was badly messing things up).
See the example notebook at https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_10_18/interp.ipynb .
Problem
It is always a bit risky to interpolate / find the nearest neighbor to a query or similar, as bad things can happen if querying a value for a point that is outside of the area that is represented. Fortunately, xarray returns NaN if performing interp
outside of the bounds of a dataset:
import xarray as xr
import numpy as np
xr.__version__
'2023.9.0'
data = np.array([[1, 2, 3], [4, 5, 6]])
lat = [10, 20]
lon = [120, 130, 140]
data_xr = xr.DataArray(data, coords={'lat':lat, 'lon':lon}, dims=['lat', 'lon'])
data_xr
<xarray.DataArray (lat: 2, lon: 3)>
array([[1, 2, 3],
[4, 5, 6]])
Coordinates:
* lat (lat) int64 10 20
* lon (lon) int64 120 130 140
# interp is civilized: rather than wildly extrapolating, it returns NaN
data_xr.interp(lat=15, lon=125)
<xarray.DataArray ()>
array(3.)
Coordinates:
lat int64 15
lon int64 125
data_xr.interp(lat=5, lon=125)
<xarray.DataArray ()>
array(nan)
Coordinates:
lat int64 5
lon int64 125
Unfortunately, .sel
will happily find the nearest
neighbor of a point, even if the input point is outside of the dataset range:
# sel is not as civilized: it happily finds the neares neighbor, even if it is "on the one side" of the example data
data_xr.sel(lat=5, lon=125, method='nearest')
<xarray.DataArray ()>
array(2)
Coordinates:
lat int64 10
lon int64 130
This can easily cause tricky bugs.
Discussion
Would it be possible for .sel
to have a behavior that makes the user aware of such issues? I.e. either:
- print a warning on stderr
- return NaN
- raise an exception
when performing a .sel
query that is outside of a dataset range / not in between of 2 dataset points?
I understand that finding the nearest neighbor may still be useful / wanted in some cases even when being outside of the bounds of the dataset, but the fact that this happens silently by default has been causing bugs for us. Could either this default behavior be changed, or maybe enabled with a flag (allow_extrapolate=False
by default for example, so users can consciously opt it in)?