[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread #5121

Jokeren · 2024-11-12T03:45:37Z

This PR includes the following changes:

Adds comprehensive tests for mixed-precision dot products, including configurations such as f8xf16, i8xf16, f8xf32, and i8xf32.
Fixes mmav2 when the k dimension contains duplicated elements. For example, with a 16x16 fp16 triton tensor (opidx=0, kwidth=4), a 16x32 tile is used, causing the first 16 elements in the k dimension to repeat in the last 16 elements. During mmav2 computation, only the first half is required.

…o keren/large-kwidth-fix

Jokeren · 2024-11-12T21:00:58Z

lib/Conversion/TritonGPUToLLVM/MemoryOpToLLVM.cpp

      if (auto mma = dyn_cast<NvidiaMmaEncodingAttr>(dot.getParent())) {
-        bool legacyLoweringIsBuggy = dot.getKWidth() >= 8;
+        bool legacyLoweringIsBuggy =
+            kWidth >= 8 || (kWidth == 4 && bitwidth == 32);


Let's enable this path by default soon for anything other than ldmatrix

lezcano

SGTM provided the tests exercise this case.

lezcano · 2024-11-13T18:38:31Z

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/DotOpToLLVM/MMAv2.cpp

-          for (size_t e = 0; e < numElemsPerVec; ++e) {
-            si.push_back(kRep * numElemsPerVec + tile * kWidth + e);
-          }
+      if (kIters <= repK) {


nb. A way to simplify this logic is to invert the LL and then look at the register that holds the value for every top-left element of every tile.

Right. That's something I'll try out.

python/test/regression/test_cast_matmul.py

…o keren/large-kwidth-fix

ThomasRaoux

LGTM

lezcano

Thank you for updating the tests to exercise this new path!

mobicham · 2024-11-15T09:39:22Z

Could this possibly improve performance for this use-case? #4906 (comment)

Jokeren · 2024-11-15T13:48:58Z

I'm not sure. Feel free to try it out

mobicham · 2024-11-15T17:00:08Z

Currently triton built from the master branch is crashing with torch.compile that's why I asked. Will def try it once this is resolved.

Jokeren added 2 commits November 11, 2024 22:40

Update

8a3d4b1

Update

1e9cf31

Jokeren changed the title ~~[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread~~ [DRAFT][BACKEND] Fix a special case where elements along the k dimension are repeated within each thread Nov 12, 2024

Jokeren added 2 commits November 11, 2024 22:59

Update

c508d6f

Update

8300969

Jokeren mentioned this pull request Nov 12, 2024

CUDA_ERROR_ILLEGAL_ADDRESS on certain small tile sizes #5125

Open

Jokeren and others added 4 commits November 12, 2024 15:53

Update

5174f5f

Merge branch 'main' into keren/large-kwidth-fix

9a5779b

Update

621fabb

Merge branch 'keren/large-kwidth-fix' of github.com:openai/triton int…

e741196

…o keren/large-kwidth-fix

Jokeren commented Nov 12, 2024

View reviewed changes

Jokeren added 3 commits November 12, 2024 16:02

Update

ab39068

Update

2b4fe94

Update

631b9ad

Jokeren changed the title ~~[DRAFT][BACKEND] Fix a special case where elements along the k dimension are repeated within each thread~~ [BACKEND] Fix a special case where elements along the k dimension are repeated within each thread Nov 12, 2024

Jokeren marked this pull request as ready for review November 12, 2024 22:35

Jokeren requested a review from ptillet as a code owner November 12, 2024 22:35

Jokeren added 2 commits November 12, 2024 18:45

Update

82c6b02

Update

b16037f

Jokeren requested review from lezcano and ThomasRaoux November 13, 2024 01:46

lezcano approved these changes Nov 13, 2024

View reviewed changes

Jokeren added 8 commits November 13, 2024 18:40

Update K=32 and K=16 tests

7fe95ea

Update

20368e2

Merge branch 'keren/large-kwidth-fix' of github.com:openai/triton int…

ac40527

…o keren/large-kwidth-fix

Update

8e217db

Merge branch 'keren/large-kwidth-fix' of github.com:openai/triton int…

c6c6017

…o keren/large-kwidth-fix

Update

5858ec5

Merge branch 'keren/large-kwidth-fix' of github.com:openai/triton int…

b5489fe

…o keren/large-kwidth-fix

Update

1ccd078

ThomasRaoux approved these changes Nov 14, 2024

View reviewed changes

lezcano approved these changes Nov 14, 2024

View reviewed changes

lezcano merged commit 7f06338 into main Nov 14, 2024
7 checks passed

lezcano deleted the keren/large-kwidth-fix branch November 14, 2024 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread #5121

[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread #5121

Jokeren commented Nov 12, 2024 •

edited

Loading

Jokeren Nov 12, 2024

lezcano left a comment

lezcano Nov 13, 2024

Jokeren Nov 13, 2024

ThomasRaoux left a comment

lezcano left a comment

mobicham commented Nov 15, 2024

Jokeren commented Nov 15, 2024

mobicham commented Nov 15, 2024

[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread #5121

[BACKEND] Fix a special case where elements along the k dimension are repeated within each thread #5121

Conversation

Jokeren commented Nov 12, 2024 • edited Loading

Jokeren Nov 12, 2024

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

lezcano Nov 13, 2024

Choose a reason for hiding this comment

Jokeren Nov 13, 2024

Choose a reason for hiding this comment

ThomasRaoux left a comment

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

mobicham commented Nov 15, 2024

Jokeren commented Nov 15, 2024

mobicham commented Nov 15, 2024

Jokeren commented Nov 12, 2024 •

edited

Loading