tools: Add a tool to aid with running benchmarks#1454
Conversation
README.md
Outdated
|  | ||
|
|
||
| Finally, for an especially large binary, we link the chromium web browser with debug info. | ||
| See [benchmarks/ryzen-9955hx.md](benchmarks/ryzen-9955hx.md) for benchmark results. |
There was a problem hiding this comment.
While the benchmark content seems adequate, I believe the README should also include key benchmark results (particularly those related to link times) as before. Many new users of Wild likely have the question "What makes it so superior to existing linkers?", so having an answer readily available in plain sight would be meaningful. (In this regard, I think moving the "What's working?", "What isn't yet supported?" and benchmarks to the top of the README would be a good idea. Of course, these may not fall under the scope of this PR, though)
There was a problem hiding this comment.
Yeah, I agree. A large number of benchmarks is fine, but then we should have a TL;DR section summarizing the results a bit.
00d0d74 to
1d025e9
Compare
| markdown.push_str(&format!( | ||
| "\n\n", | ||
| benchmark.config.name | ||
| )); |
There was a problem hiding this comment.
I think there should be the text with name of the scenario before the images. It'd allow to quickly search if you knew what you're looking for.
There was a problem hiding this comment.
While we're at it, some rough table of contents would be nice too.
There was a problem hiding this comment.
Fully agree with both the suggestions! Regarding the later one - I would also include ratio (%) comparison for easier orientation. Digesting the speed-up from the pictures needs some mental capacity :)
There was a problem hiding this comment.
I've just prototyped the % change thing. Initially I made the baseline be whatever was slowest (or largest for memory). I didn't really like that though because it sometimes was LLD, sometimes was Mold and if GNU ld was included, then that was the baseline. I contemplated making LLD always be the baseline, but in the end decided to make the last benchmark (which is the latest version of Wild) be the baseline. That means that the other linkers are generally reported as positive percentage increases - except for ripgrep-memory where GNU ld uses less memory than Wild. You can see the results here:
https://github.com/davidlattimore/wild/blob/a9ed091c45244d5ca05928b02448144281c343ac/benchmarks/ryzen-9955hx.md
| bumpalo-herd = "0.1.2" | ||
| bytesize = "2.0.0" | ||
| clap = { version = "4.0.0", features = ["derive"] } | ||
| clap = { version = "4.1.0", features = ["derive"] } |
There was a problem hiding this comment.
I may be a bit too pedantic here, but is there any reason to update the version here?
There was a problem hiding this comment.
This seems to be due to the minimal-versions check: https://github.com/davidlattimore/wild/actions/runs/20982989095/job/60311647318
There was a problem hiding this comment.
Yep, that's it exactly. With this change, Wild no longer builds if we use clap version 4.0.0, so I bumped the minimum version. 4.1.0 is three years old, so it's unlikely that anyone would want to use an older version together with any of our crates.
Can we rely on |
|
Generally speaking - I really like the graphs and especially the improvement that has been achieved among the Wild's releases. It's even better investment for the future as the time series is going to be even more interesting. |
a9ed091 to
0aff659
Compare
|
Thanks for the feedback folks! I've added a few interesting benchmarks back to the README. We also now directly link to each benchmark page from the readme. The newly added percentages at the bottom of the charts show how each benchmark result differs from the latest version of wild. Using the latest version of wild as the baseline makes sense since this allows us to easily compare it with previous versions of wild or compare it with other linkers. I added spaces to make the numbers over 1000 more readable. I didn't want to have different charts use different units, because it could be easy to miss that the unit has changed. I've added a Raspberry Pi benchmarks page and added one Raspberry Pi benchmark back to the main README. I also added confidence intervals, which show how confident we are in the mean that we computed for each benchmark. Where the confidence interval is too large, we could reduce it by running more iterations for that benchmark. |
c694b52 to
c821900
Compare
|
This PR is now just the tool. The updated benchmarks are now part of the PR to release 0.8.0 - #1456 |
c821900 to
804d587
Compare
|
When I try to run benchmarks with this tool, it doesn't seem like it sets paths correctly: I've patched it with |
|
It's supposed to be a file. Perhaps delete the directory? |
|
I perhaps should have mentioned that this tool is really intended for doing bulk runs of benchmarks. i.e. for running the benchmarks for a release. For one-off single-program benchmarks, I'd still recommend using poop or hyperfine. |
|
Ah, that makes sense. I was thrown off by the fact that the PathBuf formatting is adding a trailing slash to the path in the command:
which fails with stat 9.9 on my system:
Edit: never mind, I messed up, so the above part isn't relevant. It also doesn't create the file beforehand, so I wonder if it would be better to just check the parent directory of args.tmp instead. |
|
Ah, that makes sense. I added the call to "stat" after I'd been running the tool for a while, so the output file would have already existed for me. Yes, checking the parent directory sounds like the thing to do. You're welcome to send a PR to fix it if you'd like? Otherwise, I'll do it. |
No description provided.