-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BACKEND][AMD] Disable linear layout due to perf regression #4126
Conversation
We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while contining bring it up on AMD side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I also observed something similar on nvidia, but never big enough to make a significant performance difference. I was never able to pin it down, though, because once I made smaller testcases the IR was simplified to exactly the same as without LLs. That is, I was seeing that there was something about "big testcases" that prevented some important optimization.
I tried things like increasing the recursion depth in instcombine and compute-known-bits but it didn't help for my testcase.
Yeah we are looking into this at the moment. Because in the down stream kernel (specifically, this one) it's already using lots of registers so the change in codegen causes spills. the flash attention tutorial is not as a big issue due to less registers used there. |
…ang#4126) We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while continuing bring it up on AMD side. (cherry picked from commit e8bc45d)
…ang#4126) We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while continuing bring it up on AMD side. (cherry picked from commit e8bc45d)
Cherry picks for release/3.0.x General: - e8bc45d [BACKEND][AMD] Disable linear layout due to perf regression (#4126) - 9a0a7c2 [AMD] Add basic verification to MFMA encoding (#4117) for RDNA: - 100e2aa [AMD][WMMA] Support dot3d (#3674) - 4a1ea8e [AMD][gfx11] Fix BF16 wmma instr generation (#4135) Proton HIP PRs: - 328b86d [PROTON] Refactor GPU profilers (#4056) - 60613fb [PROTON] Roctracer: convert agent id to gpu id for gpu ops (#4090) - c1776fa [PROTON][AMD] Add Proton HIP GPU Utilization Metrics (#4119) --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com> Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: Ilya V <152324710+joviliast@users.noreply.github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: mwootton <michael.wootton@amd.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com>
…ang#4126) We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while continuing bring it up on AMD side.
We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing.
So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while contining bring it up on AMD side.