-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comprehensive benchmarking suite #4648
Comments
thanks for the ping @dcherian, i really like the idea! One other thing that often gets neglected in test suites is operating on remote data. I understand the need to avoid long-running tests and tests prone to network failures for PRs, but running these sorts of examples as a cron job could be very helpful for benchmarking and detecting issues. In intake-xarray we recently added tests against a local HTTP server and "S3" server: Also added several simple tests requiring a network connection to public data (no auth required) that we run locally but not in CI currently: |
Thanks @scottyhq
This is lining up with the "pangeo integration tests" that came up in a Pangeo meeting (cc @rabernat). Regardless whether it fits, I think adding benchmarks+tests for the xarray+zarr+fsspec (or xarray+mfdataset+netCDF) is an important and unmet need of the Pangeo community in general that we could address. |
This would be great. Down a couple of levels — I think potentially we could run this as a cron job on GitHub Actions. NCAR would also be a good plan. I'm also happy to supply a VM if that's helpful. |
Looks like Quansight thinks that GH actions is a good place to benchmark scikit-learn: https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ so may be we can set that up for our existing benchmarks. Here's the workflow: https://github.com/jaimergp/scikit-image/blob/main/.github/workflows/benchmarks-cron.yml |
@TomAugspurger are you still in charge of the pydata benchmarking machine? If so, could you add xarray to the list please (https://pandas.pydata.org/speed/)? @Illviljan has made major improvements so it should be a lot faster now |
"In charge of" is overstating it a bit. It's been segfaulting when building pandas and I haven't had a chance to debug it. If / when I get around to fixing it I'll try adding xarray, but it might be a bit. |
I think a good "infrastructure" target for the NASA OSS call would be to expand our benchmarking suite (https://pandas.pydata.org/speed/xarray/#/)
AFAIK running these in a useful manner on CI is still unsolved (please correct me if I'm wrong). But we can always run it on an NCAR machine using a cron job.
Thoughts?
cc @scottyhq
A quick survey of work needed (please append):
Related: #3514
The text was updated successfully, but these errors were encountered: