Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) #2288

Closed
zamazan4ik opened this issue Sep 13, 2023 · 3 comments
Closed

Evaluate Profile-Guided Optimization (PGO) #2288

zamazan4ik opened this issue Sep 13, 2023 · 3 comments

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. LLVM-related results are here.

PGO shows measurable improvements in compiler-like loads (CPython, Clang, Clangd, clang-format, GCC, Rustc, etc.) I think it could be useful to check PGO on Triton as well.

We need to perform PGO benchmarks on Triton. And if it shows improvements - add a note to the documentation about possible improvements in Triton's performance with PGO. Providing an easier way (e.g. a build option) to build Triton with PGO can be useful for the end-users too (and the maintainers, who rebuild packages). Maybe testing Post-Link Optimization techniques (like LLVM BOLT from Facebook) would be interesting too but I recommend starting from the usual PGO.

I think the good starting point here could be recompiling 3rd parties with PGO like LLVM if you have such an option (that's usually a good thing to do if possible).

Another possible caveat is the profile collection. For C++-based binaries, it shouldn't be a problem since the C++-based binaries will dump PGO profiles after the exit if they are built with Instrumentation support (more information about that you can read here). If you are using C++ libraries in Python code - it could be tricky to dump the profiles. There could be an option to write a C++ "wrapper", run on near real-life workloads, collect profiles, recompile the libraries with the profile, and then use it via Python. Also, you can check how Pydantic integrated PGO into their build pipeline (they also have a similar project structure, but instead of C++ they use Rust - there shouldn't be a huge difference here): pydantic/pydantic-core#741 .

@Jokeren
Copy link
Contributor

Jokeren commented Sep 13, 2023

I don't think existing instrumentation-based PGO support GPU applications. Correct me if I were wrong (I had a talk with Tipp long time ago).

@zamazan4ik
Copy link
Author

I don't think existing instrumentation-based PGO support GPU applications. Correct me if I were wrong (I had a talk with Tipp long time ago).

I didn't hear about using PGO for the executed on a GPU code. But I thought Triton has also a large amount of CPU-only code, hasn't it?

@Jokeren
Copy link
Contributor

Jokeren commented Sep 13, 2023

I didn't hear about using PGO for the executed on a GPU code. But I thought Triton has also a large amount of CPU-only code, hasn't it?

It has, but lots of the overhead can be mitigated through cuda graph

@Jokeren Jokeren closed this as completed Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants