-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an API to compare benchmark results #5297
Comments
What about https://codspeed.io or https://bencher.dev? I like how Bencher is calling it continuous benchmarking. |
These are all external tools and they are meant to be used in CI. Since we don't control what machine is running the benchmark, this issue is only about manual check on your own machine. It's about iterating on the same function, not about catching regressions. |
Sort of comparing performance between two branches? Would that be possible without JSON being written to disk? |
The idea I am describing in the issue is to use the result from the |
Interesting idea! Speaking of It looks like their approach is that they require either |
We already support benchmark json output. We decided to remove the |
Hello @sheremet-va That's actually a great idea, I'm working on skott and especially on a pull request that is meant to compare skott's performances over time when running analysis on specific JS/TS projects. The current way I'm implementing result comparison is by only using n and n-1 versions using git diff to witness the changes, but it would be great to have a complete history indeed and store more than one previous version. For that I tried codspeed but the CI just takes a while, I don't know if it's related to the free-tier or what but the job is taking more than 1h50 to complete while just using vitest bench API takes few minutes so it's not an option for me at all.
To be honest my primary concern was to run in the CI to limit hardware-related variations, even though adding OS related information could be useful when running benchmarks from a local machine. I feel like trying to diff a set of benchmarks add OS properties in the output might be confusing, you might want to do that at the filename level so that each OS combination can get its own diff? Currently I'm re-writing the outputFile generated by vitest into something like So at some point I was kind of wondering, should I create a vitest plugin and do something like codspeed but instead of storing the data on my private cloud, just emit the data files in a dedicated location and provide a way to compare them over time. But if you're willing to integrate that in the core, it might not be that relevant? What do you think? |
Adding OS-related information is just to give a clear message that the benchmark might be off because it was running on another machine, or even throw an error.
How do you store the generated output in CI? |
Yeah better have more information than not enough. Also I'm not sure how stable the GitHub Actions hardware is when not using custom pools, default pools might have agents with more or less cores, so variations can even happen there indeed.
For now it's only a JSON file written and committed by the CI in my repo (at a specific location near-by the benchmark files themselves) but it could be more sophisticated. Consequently for now I'm not storing the history of all benchmarks, I'm just overriding it each time a new PR runs with the new results, allowing me to do the diff between n and n-1 versions. But it could be great to keep track of all the benchmarks nonetheless to track big regressions/improvements over time, n and n-1 is a very small dataset |
Assuming that the use case is mostly a local comparison, I made a prototype as a custom reporter: demo# I used VITEST_BENCH_xxx env var for prototype
# save bench data on main branch
# --benchmark.outputFile=main.json
$ VITEST_BENCH_OUTPUT_FILE=main.json npx vitest bench --run
...
# suppose you switched a branch and compare against main
# --benchmark.compare=main.json
$ VITEST_BENCH_COMPARE=main.json npx vitest bench --run
...
RUN v1.4.0 /home/projects/vitest-dev-vitest-mm6syc
[... here current deafult reporter ....]
✓ test/basic.bench.ts (2) 2483ms
✓ sort (2) 2471ms
name hz min max mean p75 p99 p995 p999 rme samples
· normal 58.7590 13.7350 40.1650 17.0187 16.4350 40.1650 40.1650 40.1650 ±13.39% 30 fastest
· reverse 10.6565 74.6000 115.93 93.8395 114.16 115.93 115.93 115.93 ±14.26% 10
[... custom compare reporter ....]
[BENCH] Comparison
current : bench-default.json
baseline : main.json
sort
normal 58.759hz [baseline: 73.419hz] [change: 0.80x ⇓]
reverse 10.656hz [baseline: 10.870hz] [change: 0.98x ⇓] A few things I noticed:
FYI, I was also looking around prior arts and for example this one https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html#baselines has three flags for comparison purpose:
|
@sheremet-va Can you elaborate this? I thought this is Vitest benchmark reporter feature, but you want to move this feature to tinybench? |
If I remember correctly, the reporter already gets the result sorted - so we can't do this in our own reporter because custom reporters would have to reimplement it.
I wanted to make sure that tinybench supports providing results based on benchmarks that are not actually running.
If this is a first-class feature, I think it is fine. We already expose some flags that are only relevant for reporters.
We can change the format of json. Benchmark is an experimental feature and doesn't follow semver. |
What you mean by "sorted'? I don't know about how current default tty reporter works, but if comparison summary is required only at the end like in my prototype, then it has complete information as |
The result in the This is just my expectation based on what we already do. If other frameworks do it differently and more ergonomically, then we can follow them, but I would expect to see the difference in all table columns, no? |
I might be still missing something, but let me confirm with my PR. My proof of concept is currently separated at the end, but I think it's possible to show together in the current table like you expect without changing anything on tinybench. Let me try that in the PR. |
It's a rough mockup, but it's possible to create a table like this: @sheremet-va Is the direction okay or did you have something else in your mind? |
Yes, this is how I imagined it. I am not qualified enough to tell if it's better this way, so I would wait for more feedback. What we can also do is reverse the table and duplicate the column instead: https://github.com/google/benchmark/blob/main/docs/tools.md#modes-of-operation Also, maybe we should print two tables: the first is the same as we do now, and the second one has a difference in each field: Side note: would be awesome if we could create a graph 👀 |
Clear and concise description of the problem
There is currently no way to compare benchmark results between different runs. For example, it's possible to change the implementation details without changing the benchmark, and there is no way to confirm that it didn't introduce any regressions except for manually making screenshots in the terminal.
Suggested solution
Provide a flag to
vitest bench
command (like--compare=./bench.json
) to compare the current benchmark run with the previous one. We already support dumping the benchmark result in json via--reporter=json
, so it can be reused for diffing.Alternative
No response
Additional context
I think it should print results the same way we do now in a table, just compare it also with the previously stored results.
I think it might also be nice to store some information about the current machine (OS, cores, anything that can influence the result) in that benchmark and show it in the table.
Validations
The text was updated successfully, but these errors were encountered: