[BACKEND][AMD] Disable linear layout due to perf regression #4126

antiagainst · 2024-06-12T04:52:52Z

We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing.

So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while contining bring it up on AMD side.

We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while contining bring it up on AMD side.

jlebar

SGTM. I also observed something similar on nvidia, but never big enough to make a significant performance difference. I was never able to pin it down, though, because once I made smaller testcases the IR was simplified to exactly the same as without LLs. That is, I was seeing that there was something about "big testcases" that prevented some important optimization.

I tried things like increasing the recursion depth in instcombine and compute-known-bits but it didn't help for my testcase.

antiagainst · 2024-06-12T05:50:27Z

Yeah we are looking into this at the moment. Because in the down stream kernel (specifically, this one) it's already using lots of registers so the change in codegen causes spills. the flash attention tutorial is not as a big issue due to less registers used there.

…ang#4126) We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while continuing bring it up on AMD side. (cherry picked from commit e8bc45d)

Cherry picks for release/3.0.x General: - e8bc45d [BACKEND][AMD] Disable linear layout due to perf regression (#4126) - 9a0a7c2 [AMD] Add basic verification to MFMA encoding (#4117) for RDNA: - 100e2aa [AMD][WMMA] Support dot3d (#3674) - 4a1ea8e [AMD][gfx11] Fix BF16 wmma instr generation (#4135) Proton HIP PRs: - 328b86d [PROTON] Refactor GPU profilers (#4056) - 60613fb [PROTON] Roctracer: convert agent id to gpu id for gpu ops (#4090) - c1776fa [PROTON][AMD] Add Proton HIP GPU Utilization Metrics (#4119) --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com> Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: Ilya V <152324710+joviliast@users.noreply.github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: mwootton <michael.wootton@amd.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com>

…ang#4126) We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing. So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while continuing bring it up on AMD side.

antiagainst requested a review from jlebar June 12, 2024 04:52

jlebar approved these changes Jun 12, 2024

View reviewed changes

antiagainst marked this pull request as ready for review June 12, 2024 05:50

antiagainst requested review from zhanglx13 and ptillet as code owners June 12, 2024 05:50

antiagainst merged commit e8bc45d into triton-lang:main Jun 12, 2024
6 checks passed

antiagainst deleted the amd-disable-ll branch June 12, 2024 05:51

jataylo mentioned this pull request Jun 20, 2024

[RELEASE] [AMD] Additional AMD cherry-picks #4175

Merged

jlebar mentioned this pull request Jun 25, 2024

Pass repr=key when calling JITFunction.cache_hook #4207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND][AMD] Disable linear layout due to perf regression #4126

[BACKEND][AMD] Disable linear layout due to perf regression #4126

antiagainst commented Jun 12, 2024

jlebar left a comment

antiagainst commented Jun 12, 2024

[BACKEND][AMD] Disable linear layout due to perf regression #4126

[BACKEND][AMD] Disable linear layout due to perf regression #4126

Conversation

antiagainst commented Jun 12, 2024

jlebar left a comment

Choose a reason for hiding this comment

antiagainst commented Jun 12, 2024