Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark suite #68

Open
1 task
eric-czech opened this issue Jul 27, 2020 · 6 comments
Open
1 task

Add benchmark suite #68

eric-czech opened this issue Jul 27, 2020 · 6 comments

Comments

@eric-czech
Copy link
Collaborator

eric-czech commented Jul 27, 2020

We should track method performance using a benchmark suite like @alimanfoo mentioned in https://github.com/pystatgen/sgkit/pull/36#issuecomment-658893949.

It would be ideal if this ran as a part of the build, but timings gathered that way are probably not very reliable. Apparently integration of ASV suites in builds is usually just to make sure they work: astropy/astropy#6149. Presumably we should have a separate, scheduled process that runs on dedicated resources in order to benchmark accurately.

As @hammer mentions in related-sciences/gwas-analysis#10, https://github.com/ornl-oxford/genben could also be a helpful resource (scikit-allel benchmarks).

Ideas for benchmarking:

@hammer
Copy link
Contributor

hammer commented Jul 27, 2020

Wes has a lot of thoughts on this topic, e.g. https://discuss.ossdata.org/t/pooling-efforts-on-continuous-benchmarking-cb/206 which led to https://github.com/conbench/conbench.

@eric-czech
Copy link
Collaborator Author

If our benchmarks involve single server execution, dask scheduler may be a useful dimension to add to how they are parameterized (cf. https://github.com/pystatgen/sgkit/issues/48#issuecomment-668665296).

@ravwojdyla
Copy link
Collaborator

Just wanna drop ann-benchmarks as an inspiration. From readme:

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way.
This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There's a test suite that makes sure every algorithm works.

@hammer
Copy link
Contributor

hammer commented Feb 4, 2021

Dusting off an old Discourse post that contains a nice list of core operations: https://discourse.pystatgen.org/t/core-operations-in-human-gwas-workloads/41. We should probably have a benchmark for each of these core operations.

@tomwhite
Copy link
Collaborator

The code I used for benchmarking the GWAS workload on a cluster (using #438) is here: https://github.com/tomwhite/gwas-benchmark. These benchmarks are for running on an ad hoc basis, and involve quite a few manual steps (documented in the README).

A few questions:

  • How could it be more automated?
  • Where should results be recorded?
  • Where should it live?

I think it would be useful to have a cutdown version that runs on a single machine rather than a cluster, but is longer running than the micro benchmarks in #458.

@hammer
Copy link
Contributor

hammer commented May 6, 2021

Conbench from Ursa Labs could be useful for us: https://ursalabs.org/blog/announcing-conbench. cc @arunkk09 and @LiangdeLI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants