[AMD][BACKEND] Use PaddedLayout with AsyncCopy on gfx950 when pipelining #8365

AlexAUT · 2025-10-03T16:03:11Z

Adds support to the AMD pipeliner to compose PaddedSharedEncoding for ttg.async_copy_global_to_local on gfx950.

As described in #7929 padding alone cannot avoid bank conflicts on GFX9 because due to hardware design we can only add padding at warp boundaries 64 threads × 16 bytes = 1024 byte so in addition to padding we also reorder rows via the linearComponent of the PaddedSharedEncoding.

Rows are reordered to place 16 consecutive logical rows strided by 1024 bytes into shared memory. For instance, if each row is 256 bytes, the layout would look like:
[[row0], [row16], [row32], [row48], /*1024bytes*/ [row1], [row17], [row33], [row49], /*2048bytes*/ [row2], [row18] ...]

The aim of those layouts is to reduce register pressure and instruction count compared to swizzled layouts at the expense of a slightly increased LDS memory footprint.
This PR includes support for dtype==16bits and the tensor size is >= 16KB.

…ing for the smem layout dev Bank conflict free layouts for mfma32 and mfma16 with and without transpose Bank conflict free kWidth=4 Fix debug prints Put padded composition to separate function

This reverts commit 415e24c.

antiagainst

Nice!

third_party/amd/lib/TritonAMDGPUTransforms/ScheduleLoops.cpp

third_party/amd/lib/TritonAMDGPUTransforms/Utility.h

third_party/amd/lib/TritonAMDGPUTransforms/LowerLoops.cpp

AlexAUT · 2025-10-06T13:06:04Z

Thank you for the quick review, I think I addressed all comments.

The mfma32 case produces some bank conflicts due to refactoring before opening the PR. I am not sure if we want to wait for the fixes or not. I will have a fix for it a bit later today.

AlexAUT · 2025-10-06T15:22:50Z

37a9e80 fixes the bank conflicts for mfma32.

AlexAUT added 25 commits October 2, 2025 09:39

Store sharedencodingtrait instead of swizzledsharedencoding when look…

7c0ac7e

…ing for the smem layout dev Bank conflict free layouts for mfma32 and mfma16 with and without transpose Bank conflict free kWidth=4 Fix debug prints Put padded composition to separate function

Allow async copy for padded layouts

c3b6d04

Adjust to upstream changes

ab180b6

Update

dd351e0

Enable size checks for padded layouts

341a207

Refactor layout selection

7e27b9d

Simplify layout selection logic

8421dba

Cleanup

d3ea59c

Work on comments

870e18e

Cleanup compose function

18a7bdd

Refactor

24e69d3

Cleanup

f097355

Doc

fc1ad39

Cleanup

7ef0891

Cleanup

b476d82

Fix

5269d98

Disable matmul tests running out of LDS space

415e24c

Refactor into wide/narrow layouts

28ad504

Revert "Disable matmul tests running out of LDS space"

29f881f

This reverts commit 415e24c.

Remove old change

c9663c0

Enable async copy coalesce check for padded layouts

054bc61

Merge branch 'main' into asyncPaddedPipeline

7426a82

More cleanup after merge

c25c2e2

doc

0e9a68a

Doc

52ba7b1

antiagainst requested changes Oct 3, 2025

View reviewed changes

antiagainst marked this pull request as ready for review October 3, 2025 17:59

antiagainst requested a review from zhanglx13 as a code owner October 3, 2025 17:59

AlexAUT added 2 commits October 6, 2025 09:17

Merge branch 'main' into asyncPaddedPipeline

c28983e

Address review

a74c152

Fix mfma32 kContig bank conflicts

37a9e80

antiagainst approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'main' into asyncPaddedPipeline

baadfab

antiagainst merged commit ac0bb72 into triton-lang:main Oct 7, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD][BACKEND] Use PaddedLayout with AsyncCopy on gfx950 when pipelining #8365

[AMD][BACKEND] Use PaddedLayout with AsyncCopy on gfx950 when pipelining #8365

AlexAUT commented Oct 3, 2025

Uh oh!

antiagainst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexAUT commented Oct 6, 2025

Uh oh!

AlexAUT commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

[AMD][BACKEND] Use PaddedLayout with AsyncCopy on gfx950 when pipelining #8365

[AMD][BACKEND] Use PaddedLayout with AsyncCopy on gfx950 when pipelining #8365

Conversation

AlexAUT commented Oct 3, 2025

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexAUT commented Oct 6, 2025

Uh oh!

AlexAUT commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!