Open
Description
Is your feature request related to a problem?
I want to do coord1.isin.coord2
and it is quite slow when coords are large and of object type CFTimeIndex
.
import xarray as xr
import numpy as np
n=1000
coord1 = xr.cftime_range(start='2000', freq='MS', periods=n)
coord2 = xr.cftime_range(start='2000', freq='3MS', periods=n)
# cftimeindex: very fast
%timeit coord1.isin(coord2) # 743 µs ± 1.33 µs
# np.isin on index.asi8
%timeit np.isin(coord1.asi8,coord2.asi8) # 7.83 ms ± 14.1 µs
da = xr.DataArray(np.random.random((n,n)),dims=['a','b'],coords={'a':coord1,'b':coord2})
# when xr.DataArray coordinate slow
%timeit da.a.isin(da.b) # 94.9 ms ± 959 µs
# when converting xr.DataArray coordinate back to index slow
%timeit np.isin(da.a.to_index(), da.b.to_index()) # 97.4 ms ± 819 µs
# when converting xr.DataArray coordinate back to index asi
%timeit np.isin(da.a.to_index().asi8, da.b.to_index().asi8) # 7.89 ms ± 15.2 µs
Describe the solution you'd like
faster coord1.isin.coord2
by default. could we re-route here, e.g. to the alternative?
conversion from coordinate
to_index()
is costly I guess
Describe alternatives you've considered
np.isin(coord1.to_index().asi8, coord2.to_index().asi8
brings me nice speedups in pangeo-data/climpred#724
Additional context
unsure whether this issue should go here on in cftime