Open
Description
I think a good "infrastructure" target for the NASA OSS call would be to expand our benchmarking suite (https://pandas.pydata.org/speed/xarray/#/)
AFAIK running these in a useful manner on CI is still unsolved (please correct me if I'm wrong). But we can always run it on an NCAR machine using a cron job.
Thoughts?
cc @scottyhq
A quick survey of work needed (please append):
- indexing & slicing Improve indexing performance benchmarks #3382 Performance: numpy indexes small amounts of data 1000 faster than xarray #2799 Slow performance of isel #2227
- DataArray construction Speed up Dataset._construct_dataarray #4744
- attribute access Attribute style access is slow #4741, speedup attribute style access and tab completion #4742
- property access Should we cache some small properties? #3514
- reindexing? slow performance with open_mfdataset #1385 (comment)
- alignment Performance problem when doing computation between two arrays with discontinuous indexes #3755, [skip-ci] Add alignment benchmarks #7738
- assignment Needs performance check / improvements in value assignment of DataArray #1771
- coarsen
- groupby groupby very slow compared to pandas #659 [skip-ci] Add cftime groupby, resample benchmarks #7795 Speed up .dt accessor by preserving Index objects. #7796
- resample Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) #4498 [skip-ci] Add cftime groupby, resample benchmarks #7795
- weighted Allow skipna in .dot() #4482 weighted operations: performance optimisations #3883
- concat Improve concat performance #7824
- merge
- open_dataset, open_mfdataset We need a fast path for open_mfdataset #1823
- stack / unstack
- apply_ufunc?
- interp Faster interp #4740 Improve interp performance #7843
- reprs Speed up Dataset._construct_dataarray #4744
- to_(dask)_dataframe Improve to_dask_dataframe performance #7844 Add benchmarks for to_dataframe and to_dask_dataframe #7474
Related: #3514