Description
Is your feature request related to a problem? Please describe.
I've been using xarray with atmospheric data given on pressure levels. This data is best thought of in log(pressure) for computation, but it is stored and displayed as standard numbers. I would love if there was some way to have data.sel(lev=4, method='nearest')
return the nearest data in log-space without having to have my index stored in the log space.
E.g, currently
>>> a = xr.DataArray(['a', 'b', 'c'], dims='lev', coords=[('lev', [0.1, 10, 100])])
>>> a.sel(lev=2, method='nearest')
<xarray.DataArray ()>
array('a', dtype='<U1')
Coordinates:
lev float64 0.1
but in the scientific sense underlying this data, the closer point was actually 'b' at 10, not 'a' at 0.1.
In general, one can imagine situations where the opposite is true (storing data in log-space for numerical accuracy, but wanting a concept of 'nearest' which is the standard linear sense), or a desire for arbitrary scaling.
Describe the solution you'd like
The simplest solution I can imagine is to provide a preprocessor argument to the sel function which operates over numpy values and is used before the call to get_loc
.
e.g.
>>> a.sel(lev=2, method='nearest', pre=np.log)
<xarray.DataArray ()>
array('b', dtype='<U1')
Coordinates:
lev float64 10.0
I believe this can be implemented by wrapping both index and label_value here with a call to the preprocess function (assuming the argument is only desired alongside the 'method' kwarg):
Lines 224 to 226 in 9daf9b1
Describe alternatives you've considered
I'm not sure how this would relate to the ongoing work on #1603, but one solution is to include a concept of the underlying number line within the index api. The end result is similar to the proposed implementation, but it would be stored with the index rather than passed to the sel
method each time. This may keep the sel
api simpler if this feature was only available for a special ScaledIndex class or something like that.
One version of this could also be used to set reasonable defaults when plotting, e.g. if a coordinate has a log numberline then it could set the x/yscale to 'log' by default when plotting over that coordinate.