mx: triton kernel to cast to mx and write in col-major #1932

vkuzo · 2025-03-21T13:59:07Z

Summary:

Implements a triton kernel for a cast to mxfp8 from a row-major input across dim1, which is 3.5x to 4.5x faster than what compile can generate today. Note that this is a prototype kernel, and I expect to (a) improve it in future PRs and (b) delete it in ~weeks when we have compile support for this.

An integration into MXLinear will follow in a separate PR.

Example of tiling (simplified for small example size):

        Example tiling for n_rows==8, n_cols=8, ROW_TILE_SIZE=4, COL_TILE_SIZE=4, INNER_BLOCK_SIZE=2,
        pid_row=0, pid_col=0:

        Input (row-major)

        cols      0  1  2  3  4  5  6  7
        --------------------------------
        rows 0 |  0  1  2  3
             1 |  8  9 10 11
             2 | 16 17 18 19
             3 | 24 25 26 27
             4 |
             5 |
             6 |
             7 |

        Output (row-major of transpose), ids are from input

        cols      0  1  2  3  4  5  6  7
        --------------------------------
        rows 0 |  0  8 16 24
             1 |  1  9 17 25
             2 |  2 10 18 26
             3 |  3 11 19 27
             4 |
             5 |
             6 |
             7 |

        Output (scales), s(0, 8) means the scale used to cast elements 0 and 8

        rows           0          1  ...      4  ...       31
        ------------------------------------------------------
                  s(0, 8)  s(16, 24) ... s(1, 9) ... s(19, 27)

Test Plan:

// tests pass
pytest test/prototype/mx_formats/test_custom_cast.py -s -x -k triton_mxfp8_dim1

// performance compile vs triton: https://www.internalfb.com/phabricator/paste/view/P1762691809
// * 4k by 4k tensor: about a 3.6x speedup
// * 16k by 16k tensor: about a 4.7x speedup

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-03-21T13:59:08Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-03-21T13:59:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1932

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7ecd79f with merge base 3fb1665 ():

NEW FAILURE - The following job has failed:

Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
ModuleNotFoundError: No module named 'importlib_metadata'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4c77bd3 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 18073c5 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c2d2aed ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 91248bc ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3a53ae7 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 76fef6c ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8486913 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c44a9db ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d590401 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 26e84f9 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 95105a6 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

eellison · 2025-03-21T22:37:27Z

torchao/prototype/mx_formats/custom_cast.py

+        # example transformation (specifics depend on tile sizes):
+        # [0, 1, 2, 3, 4, 5, 6, 7] -> [0, 1, 4, 5, 8, 9, 12, 13]
+        col_scale_indices = col_scale_indices + (
+            tl.floor(col_scale_indices / BLOCKS_PER_ROW_TILE) * jump_vals_per_col


i think we should just be doing integer division instead of floor + /

eellison

looks good ! next step: compile to generate this

eellison · 2025-03-21T22:37:48Z

torchao/prototype/mx_formats/custom_cast.py

+        ).to(tl.int32)
+
+        # TODO(future): mask this store
+        tl.store(col_scale_start_ptr + col_scale_indices, col_scale_e8m0)


in the launcher, should we assert divisibility of block sizes, so we hard error for this case ?

yes, for now I hackly assert that on L1319:L1322

eellison · 2025-03-21T22:39:18Z

torchao/prototype/mx_formats/custom_cast.py

+        )
+
+        return (
+            output_col_major.t(),


since only the data_ptr of output_col_major is used when you pass it into triton, you could initialize it with the correct strides

eellison · 2025-03-21T22:39:26Z

torchao/prototype/mx_formats/custom_cast.py

+
+        return (
+            output_col_major.t(),
+            col_scale.reshape(-1, 1).view(torch.float8_e8m0fnu),


same thing here..

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d1c1db7 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6c795ce ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

af6ae2f

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

0e03655

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4c77bd3 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 21, 2025

Update

45120de

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

3711302

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 18073c5 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

5527e72

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

b9da1f9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c2d2aed ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

478b9e1

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

3ac7b62

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 91248bc ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

571775d

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

a750885

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3a53ae7 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

fd30558

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

e330b67

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 76fef6c ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

b0cd056

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

8801d1f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8486913 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

26b49fd

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

0f0fb1e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c44a9db ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

ba10a02

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

e18804a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d590401 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

483cdfd

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

3df6c2d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 26e84f9 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

vkuzo changed the title ~~[wip] triton kernel to cast to mx and write in col-major~~ mx: triton kernel to cast to mx and write in col-major Mar 21, 2025

Update

32005c9

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 21, 2025

[wip] triton kernel to cast to mx and write in col-major

fdb945a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 95105a6 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

vkuzo added the topic: performance Use this tag if this PR improves the performance of a feature label Mar 21, 2025

eellison self-requested a review March 21, 2025 16:47

eellison reviewed Mar 21, 2025

View reviewed changes

vkuzo mentioned this pull request Mar 21, 2025

triton kernel to cast to mx across dim0 and dim1 #1869

Closed

eellison approved these changes Mar 21, 2025

View reviewed changes

Update

e341c2e

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 24, 2025

[wip] triton kernel to cast to mx and write in col-major

2432fe6

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d1c1db7 ghstack-comment-id: 2743450537 Pull Request resolved: #1932

Update

7ecd79f

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Mar 24, 2025

[wip] triton kernel to cast to mx and write in col-major

53bae65

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6c795ce ghstack-comment-id: 2743450537 Pull Request resolved: #1932

This was referenced Mar 24, 2025

integrate mx dim1 triton kernel into MXLinear #1943

Merged

print MX config when printing MXLinear and MXInferenceLinear #1947

Merged

mx roofline: adjust mxfp8 formulas #1953

Merged

vkuzo merged commit d32afef into main Mar 26, 2025
49 of 50 checks passed

vkuzo mentioned this pull request Mar 27, 2025

MX single node performance tracker #1768

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mx: triton kernel to cast to mx and write in col-major #1932

mx: triton kernel to cast to mx and write in col-major #1932

Uh oh!

vkuzo commented Mar 21, 2025 •

edited

Loading

Uh oh!

vkuzo commented Mar 21, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 21, 2025 •

edited

Loading

Uh oh!

eellison Mar 21, 2025

Uh oh!

eellison left a comment

Uh oh!

eellison Mar 21, 2025

Uh oh!

vkuzo Mar 24, 2025

Uh oh!

eellison Mar 21, 2025

Uh oh!

eellison Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

mx: triton kernel to cast to mx and write in col-major #1932

mx: triton kernel to cast to mx and write in col-major #1932

Uh oh!

Conversation

vkuzo commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1932

❌ 1 New Failure

Uh oh!

eellison Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

eellison Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

eellison Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vkuzo commented Mar 21, 2025 •

edited

Loading

vkuzo commented Mar 21, 2025 •

edited

Loading

pytorch-bot bot commented Mar 21, 2025 •

edited

Loading