[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 #21166

fxmarty-amd · 2025-07-18T08:52:25Z

As per title.

Support for MXFP4 in vllm was added in #16943 and #17888.

However, some hardware as AMD Instinct MI350/MI355 support as well math in mxfp6 or mixed fp4 / fp6 (see e.g. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-cdna4-instruction-set-architecture.pdf, with switches for the input dtype in V_MFMA_SCALE_F32_16X16X128_F8F6F4 and V_MFMA_SCALE_F32_32X32X64_F8F6F4), and adding support for it in vllm (simulated for now) is only a small stretch compared to current mxpf4 support.

Concerning fused MOE, passing OCP_MX_Scheme as an Enum would result in

ValueError: infer_schema(func): Parameter ocp_mx_scheme has unsupported type typing.Optional[vllm.model_executor.layers.quantization.utils.ocp_mx_utils.OCP_MX_Scheme]. The valid types are: dict_keys([<class 'torch.Tensor'>, typing.Optional[torch.Tensor], typing.Sequence[torch.Tensor], typing.List[torch.Tensor], collections.abc.Sequence[torch.Tensor], list[torch.Tensor], typing.Sequence[typing.Optional[torch.Tensor]], typing.List[typing.Optional[torch.Tensor]], collections.abc.Sequence[typing.Optional[torch.Tensor]], list[typing.Optional[torch.Tensor]], <class 'int'>, typing.Optional[int], typing.Sequence[int], typing.List[int], collections.abc.Sequence[int], list[int], typing.Optional[typing.Sequence[int]], typing.Optional[typing.List[int]], typing.Optional[collections.abc.Sequence[int]], typing.Optional[list[int]], <class 'float'>, typing.Optional[float], typing.Sequence[float], typing.List[float], collections.abc.Sequence[float], list[float], typing.Optional[typing.Sequence[float]], typing.Optional[typing.List[float]], typing.Optional[collections.abc.Sequence[float]], typing.Optional[list[float]], <class 'bool'>, typing.Optional[bool], typing.Sequence[bool], typing.List[bool], collections.abc.Sequence[bool], list[bool], typing.Optional[typing.Sequence[bool]], typing.Optional[typing.List[bool]], typing.Optional[collections.abc.Sequence[bool]], typing.Optional[list[bool]], <class 'str'>, typing.Optional[str], typing.Union[int, float, bool], typing.Union[int, float, bool, NoneType], typing.Sequence[typing.Union[int, float, bool]], typing.List[typing.Union[int, float, bool]], collections.abc.Sequence[typing.Union[int, float, bool]], list[typing.Union[int, float, bool]], <class 'torch.dtype'>, typing.Optional[torch.dtype], <class 'torch.device'>, typing.Optional[torch.device]]).

so I ended up passing it as a string instead, which is arguably not nice, but probably better than adding many use_mxfp4_*: bool, use_mxfp6_*: bool. Eventually a refactor is probably needed there.

Left to do:

Add tests
Documentation
Integrate OCP MX kernels (left to an other PR, there is ongoing work in ROCm/aiter and composable kernel)

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

gemini-code-assist

Code Review

This pull request introduces support for mxfp6 and mixed mxfp6-mxfp4 formats, which is a valuable enhancement. The code changes are well-structured, particularly the refactoring to generalize from MXFP4 to a broader OCP MX scheme.

However, there are a few critical points that need to be addressed before this can be merged:

Missing Tests: As you've noted in the description, tests are missing. Given the complexity of quantization and MoE layers, adding comprehensive tests for the new mxfp6 and mixed-precision schemes is crucial to ensure correctness and prevent future regressions.
Potential Logic Change: A check for dynamic activation quantization appears to have been removed. This could be a significant logic change and needs clarification.
In-code TODOs: There are several TODO comments in the code, indicating areas that might be incomplete or require verification. These should be resolved.

I've left specific comments on these points below. Addressing them will greatly improve the quality and reliability of this contribution.

vllm/model_executor/layers/quantization/quark/quark.py

vllm/model_executor/layers/fused_moe/fused_moe.py

vllm/model_executor/layers/quantization/quark/schemes/quark_ocp_mx.py

vllm/model_executor/layers/quantization/utils/ocp_mx_utils.py

github-actions · 2025-07-18T08:54:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

…t enum, fix dynamo issues Signed-off-by: Felix Marty <Felix.Marty@amd.com>

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

mergify · 2025-07-21T16:16:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fxmarty-amd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd · 2025-07-23T10:05:27Z

Hi @mgoin does this update sound good to you? Happy to get comments if you have some time.

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd · 2025-10-06T17:02:05Z

Why move away from "mxfp4"? Since we have the scale here, it is mxfp4 right?

I reverted the change and prefixed with mx everywhere (mxfp6_e3m2, mxfp4, etc.).

Tests are good locally. We should really run them on ROCm CI (if there is one).

mgoin

@fxmarty-amd It looks like the moe kernels failures are related

[2025-10-06T17:40:29Z] kernels/moe/test_batched_moe.py:306:
[2025-10-06T17:40:29Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-10-06T17:40:29Z] kernels/moe/utils.py:121: in naive_batched_moe
[2025-10-06T17:40:29Z]     NaiveBatchedExperts(
[2025-10-06T17:40:29Z] /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_batched_moe.py:643: in __init__
[2025-10-06T17:40:29Z]     assert self.quant_config.ocp_mx_scheme is None, "NYI"

vllm/model_executor/layers/fused_moe/utils.py

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

elvischenv · 2025-10-07T16:19:47Z

Why this PR could pass CI? The test file was renamed but not updated in the test list.

vllm/.buildkite/test-pipeline.yaml

Line 832 in 320feae

- pytest -v -s tests/kernels/moe/test_mxfp4_moe.py

https://buildkite.com/vllm/ci/builds/33836/steps/canvas?jid=0199bf13-7dcd-421a-8fac-d9bd04ad974d#0199bf13-7dcd-421a-8fac-d9bd04ad974d/90-1836

ERROR: file or directory not found: tests/kernels/moe/test_mxfp4_moe.py

fxmarty-amd · 2025-10-07T16:24:01Z

@elvischenv Ohh, apologies, probably I should not have renamed it. Let me run it locally on H100. Do you see any failure?

fxmarty-amd · 2025-10-07T16:24:10Z

cc @mgoin

…1166)

…1166) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…1166)

fxmarty-amd added 3 commits July 18, 2025 10:32

mxfp6 support

2349099

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

refactor mxfp4 to accomodate mxfp6

d538659

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

add todo

49b2bbe

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd requested review from WoosukKwon, mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners July 18, 2025 08:52

gemini-code-assist bot reviewed Jul 18, 2025

View reviewed changes

fxmarty-amd added 5 commits July 18, 2025 10:57

rename mxfp4_utils to ocp_mx_utils and add fp6 dequant function

a016c18

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

use str instead of enum as torch.library infer_schema does not suppor…

a5835ac

…t enum, fix dynamo issues Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fix a few remaining bugs

08da3e0

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

simulate on mi350 as well for now

43f0ae8

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fix e2m3/e3m2 bug

c537c76

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd force-pushed the mxfp6_mixed branch from 7bbd55d to c537c76 Compare July 18, 2025 08:57

fxmarty-amd mentioned this pull request Jul 21, 2025

[Feature]: Add MXFP6 Quantization Format #17837

Open

1 task

mergify bot added the needs-rebase label Jul 21, 2025

Merge branch 'main' into mxfp6_mixed

13dea3e

mergify bot removed the needs-rebase label Jul 22, 2025

fxmarty-amd added 6 commits July 22, 2025 17:33

wip update tests

a37ef27

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

Merge branch 'main' into mxfp6_mixed

de19714

Merge branch 'main' into mxfp6_mixed

b106df5

update tests

f07c3a8

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

update documentation

b9f9124

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

address review comments

9c1a90f

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd requested a review from hmellor as a code owner July 23, 2025 10:04

mergify bot added the documentation Improvements or additions to documentation label Jul 23, 2025

linting

e4aa06e

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

mergify bot removed the needs-rebase label Oct 6, 2025

Merge branch 'main' into mxfp6_mixed

996ddd9

fxmarty-amd force-pushed the mxfp6_mixed branch from 602d1e4 to 996ddd9 Compare October 6, 2025 15:20

fxmarty-amd added 3 commits October 6, 2025 17:33

prefix with 'mx' everywhere as suggested

76e6ee7

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fix remaining issues

28b995b

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

linting

dc246fa

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

mgoin reviewed Oct 6, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/utils.py Show resolved Hide resolved

fxmarty-amd added 4 commits October 6, 2025 11:48

typo

5cd1b2e

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fix typo

1288977

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fix tests

ec51387

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

linting

eb5f8f6

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd requested a review from mgoin October 6, 2025 23:11

skip test if amd-quark is not installed

fb0d1ac

Signed-off-by: Felix Marty <Felix.Marty@amd.com>

fxmarty-amd force-pushed the mxfp6_mixed branch from 04f661e to fb0d1ac Compare October 7, 2025 08:29

mgoin merged commit 41f1cf3 into vllm-project:main Oct 7, 2025
55 checks passed

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 7, 2025

github-project-automation bot moved this to Done in Structured Output Oct 7, 2025

github-project-automation bot moved this to Done in Tool Calling Oct 7, 2025

This was referenced Oct 7, 2025

[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py #26364

Merged

[mxfp4] Remove unnecessary process_weights_after_loading handling in case simulation is used #26111

Closed

mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

6e5b7b3

…1166)

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

fb3160c

…1166)

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

2cefc58

…1166) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

fbaf1a5

…1166)

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

3e0a9d0

…1166)

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (vllm-project#2…

1a3bcfd

…1166)

Uh oh!

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 #21166

[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 #21166

Uh oh!

Conversation

fxmarty-amd commented Jul 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

mergify bot commented Jul 21, 2025

Uh oh!

fxmarty-amd commented Jul 23, 2025

Uh oh!

fxmarty-amd commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elvischenv commented Oct 7, 2025

Uh oh!

fxmarty-amd commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fxmarty-amd commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

fxmarty-amd commented Jul 18, 2025 •

edited by github-actions bot

Loading

fxmarty-amd commented Oct 6, 2025 •

edited

Loading

fxmarty-amd commented Oct 7, 2025 •

edited

Loading