Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable, meaningful benchmarks #1615

Open
1 of 4 tasks
ClaudiaComito opened this issue Aug 12, 2024 · 2 comments
Open
1 of 4 tasks

Stable, meaningful benchmarks #1615

ClaudiaComito opened this issue Aug 12, 2024 · 2 comments
Assignees
Labels
benchmarking enhancement New feature or request

Comments

@ClaudiaComito
Copy link
Contributor

ClaudiaComito commented Aug 12, 2024

This issue is meant to track changes needed to improve benchmarks reproducibility and meaningfulness, now that our benchmarks are running on a dedicated cluster node (thanks @JuanPedroGHM !) and runtime variance has decreased dramatically.

As usual, feel free to add/edit.

@ClaudiaComito ClaudiaComito added enhancement New feature or request benchmarking labels Aug 12, 2024
@mrfh92 mrfh92 self-assigned this Oct 4, 2024
Copy link
Contributor

github-actions bot commented Oct 4, 2024

Branch features/1615-Stable_meaningful_benchmarks created!

@mrfh92
Copy link
Collaborator

mrfh92 commented Oct 9, 2024

@ClaudiaComito @JuanPedroGHM I think we need some "design decision" here, especially on:

  • granularity of the benchmarks, e.g.:
    • do we only benchmark high-level routines (e.g., standard scaler) and leave away the low level routines (here: mean, std, in-place subtraction and division) or do we benchmark both?
    • do we choose a representative case or do we want to try to cover as many cases as possible, e.g.: cover all split combinations or only a representative one (how to determine that?)? include non-split/trivially-parallel routines, too?
  • do we aim at (at least for the beginning) more or less equal run times for each benchmark or more or less equal size of the data for each benchmark? or do we pursue a completely different approach for the benchmarks?
  • regarding the names: do we want to keep the old names in order to be able to show the old data as well or do we want to start completely from scratch again once we have set up new benchmark sizes/additional benchmarks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants