Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) benchmark report #7

Open
zamazan4ik opened this issue Sep 6, 2024 · 2 comments
Open

Profile-Guided Optimization (PGO) benchmark report #7

zamazan4ik opened this issue Sep 6, 2024 · 2 comments

Comments

@zamazan4ik
Copy link

zamazan4ik commented Sep 6, 2024

Hi!

Recently I read the article about Candystore. As far as I see, the project cares a lot about its performance. That's why I decided to perform some benchmarks with more advanced compiler optimizations.

As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects (including many optimized databases, parsers, compilers, etc.) are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on Candystore to see if the performance win (or lose) can be achieved. Here are my benchmark results.

Test environment

  • Fedora 40
  • Linux kernel 6.10.7
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.83.0-nightly
  • Candystore version: main branch on commit 263d385c1bd0ba00a5d813a1fbf789c72baf9ae8
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks - candy-perf. For PGO optimization I use cargo-pgo tool. All measurements, benchmark and PGO training are done with the same command - taskset -c 0 candy_perf.

taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Also, I decided to enable LTO (lto = true for [profile.release] in the root Cargo.toml) for candy-perf - it can help the compiler perform more aggressive optimizations.

Results

I got the following results:

According to the results, PGO improves the Candystore's performance in many cases.

Also, I did quick measurements about the candy-perf's binary size:

  • Release: 895 Kib
  • LTO: 719 Kib
  • LTO + PGO optimized: 733 Kib
  • LTO + PGO instrumented: 1.5 Mib

Further steps

I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about Candystore's performance in their workloads. Maybe a small note somewhere in the documentation will be enough to raise awareness about this work.

Thank you.

P.S. I just created the Issue since Discussions are disabled for the repo. Don't treat the issue like the issue - it's more an improvement idea.

@tomerfiliba
Copy link
Member

Thanks @zamazan4ik for this thorough review. I will link to it from the README.

Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform

@zamazan4ik
Copy link
Author

Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform

You are right - PGO helps with optimizing only the CPU part of a workload. IO won't magically go away but still - the CPU part will be improved, and the benchmarks above show some performance improvements even after optimizing "only" the CPU part of the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants