Skip to content

[PERFORMANCE]: isin on CFTimeIndex-backed Coordinate slow  #6230

Open
@aaronspring

Description

@aaronspring

Is your feature request related to a problem?

I want to do coord1.isin.coord2 and it is quite slow when coords are large and of object type CFTimeIndex.

import xarray as xr
import numpy as np

n=1000
coord1 = xr.cftime_range(start='2000', freq='MS', periods=n)
coord2 = xr.cftime_range(start='2000', freq='3MS', periods=n)

# cftimeindex: very fast
%timeit coord1.isin(coord2) # 743 µs ± 1.33 µs

# np.isin on index.asi8
%timeit np.isin(coord1.asi8,coord2.asi8) # 7.83 ms ± 14.1 µs

da = xr.DataArray(np.random.random((n,n)),dims=['a','b'],coords={'a':coord1,'b':coord2})

# when xr.DataArray coordinate slow
%timeit da.a.isin(da.b) # 94.9 ms ± 959 µs

# when converting xr.DataArray coordinate back to index slow
%timeit np.isin(da.a.to_index(), da.b.to_index()) # 97.4 ms ± 819 µs

# when converting xr.DataArray coordinate back to index asi
%timeit np.isin(da.a.to_index().asi8, da.b.to_index().asi8) # 7.89 ms ± 15.2 µs

Describe the solution you'd like

faster coord1.isin.coord2 by default. could we re-route here, e.g. to the alternative?

conversion from coordinate to_index() is costly I guess

Describe alternatives you've considered

np.isin(coord1.to_index().asi8, coord2.to_index().asi8 brings me nice speedups in pangeo-data/climpred#724

Additional context

unsure whether this issue should go here on in cftime

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions