Skip to content

Conversation

@fxmarty-amd
Copy link
Contributor

@fxmarty-amd fxmarty-amd commented Jul 18, 2025

As per title.

Support for MXFP4 in vllm was added in #16943 and #17888.

However, some hardware as AMD Instinct MI350/MI355 support as well math in mxfp6 or mixed fp4 / fp6 (see e.g. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-cdna4-instruction-set-architecture.pdf, with switches for the input dtype in V_MFMA_SCALE_F32_16X16X128_F8F6F4 and V_MFMA_SCALE_F32_32X32X64_F8F6F4), and adding support for it in vllm (simulated for now) is only a small stretch compared to current mxpf4 support.

Concerning fused MOE, passing OCP_MX_Scheme as an Enum would result in

ValueError: infer_schema(func): Parameter ocp_mx_scheme has unsupported type typing.Optional[vllm.model_executor.layers.quantization.utils.ocp_mx_utils.OCP_MX_Scheme]. The valid types are: dict_keys([<class 'torch.Tensor'>, typing.Optional[torch.Tensor], typing.Sequence[torch.Tensor], typing.List[torch.Tensor], collections.abc.Sequence[torch.Tensor], list[torch.Tensor], typing.Sequence[typing.Optional[torch.Tensor]], typing.List[typing.Optional[torch.Tensor]], collections.abc.Sequence[typing.Optional[torch.Tensor]], list[typing.Optional[torch.Tensor]], <class 'int'>, typing.Optional[int], typing.Sequence[int], typing.List[int], collections.abc.Sequence[int], list[int], typing.Optional[typing.Sequence[int]], typing.Optional[typing.List[int]], typing.Optional[collections.abc.Sequence[int]], typing.Optional[list[int]], <class 'float'>, typing.Optional[float], typing.Sequence[float], typing.List[float], collections.abc.Sequence[float], list[float], typing.Optional[typing.Sequence[float]], typing.Optional[typing.List[float]], typing.Optional[collections.abc.Sequence[float]], typing.Optional[list[float]], <class 'bool'>, typing.Optional[bool], typing.Sequence[bool], typing.List[bool], collections.abc.Sequence[bool], list[bool], typing.Optional[typing.Sequence[bool]], typing.Optional[typing.List[bool]], typing.Optional[collections.abc.Sequence[bool]], typing.Optional[list[bool]], <class 'str'>, typing.Optional[str], typing.Union[int, float, bool], typing.Union[int, float, bool, NoneType], typing.Sequence[typing.Union[int, float, bool]], typing.List[typing.Union[int, float, bool]], collections.abc.Sequence[typing.Union[int, float, bool]], list[typing.Union[int, float, bool]], <class 'torch.dtype'>, typing.Optional[torch.dtype], <class 'torch.device'>, typing.Optional[torch.device]]).

so I ended up passing it as a string instead, which is arguably not nice, but probably better than adding many use_mxfp4_*: bool, use_mxfp6_*: bool. Eventually a refactor is probably needed there.

Left to do:

  • Add tests
  • Documentation
  • Integrate OCP MX kernels (left to an other PR, there is ongoing work in ROCm/aiter and composable kernel)

Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for mxfp6 and mixed mxfp6-mxfp4 formats, which is a valuable enhancement. The code changes are well-structured, particularly the refactoring to generalize from MXFP4 to a broader OCP MX scheme.

However, there are a few critical points that need to be addressed before this can be merged:

  1. Missing Tests: As you've noted in the description, tests are missing. Given the complexity of quantization and MoE layers, adding comprehensive tests for the new mxfp6 and mixed-precision schemes is crucial to ensure correctness and prevent future regressions.
  2. Potential Logic Change: A check for dynamic activation quantization appears to have been removed. This could be a significant logic change and needs clarification.
  3. In-code TODOs: There are several TODO comments in the code, indicating areas that might be incomplete or require verification. These should be resolved.

I've left specific comments on these points below. Addressing them will greatly improve the quality and reliability of this contribution.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Felix Marty <Felix.Marty@amd.com>
…t enum, fix dynamo issues

Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@mergify
Copy link

mergify bot commented Jul 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fxmarty-amd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 21, 2025
@mergify mergify bot removed the needs-rebase label Jul 22, 2025
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@fxmarty-amd fxmarty-amd requested a review from hmellor as a code owner July 23, 2025 10:04
@mergify mergify bot added the documentation Improvements or additions to documentation label Jul 23, 2025
@fxmarty-amd
Copy link
Contributor Author

Hi @mgoin does this update sound good to you? Happy to get comments if you have some time.

Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@mergify mergify bot removed the needs-rebase label Oct 6, 2025
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@fxmarty-amd
Copy link
Contributor Author

fxmarty-amd commented Oct 6, 2025

Why move away from "mxfp4"? Since we have the scale here, it is mxfp4 right?

I reverted the change and prefixed with mx everywhere (mxfp6_e3m2, mxfp4, etc.).

Tests are good locally. We should really run them on ROCm CI (if there is one).

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fxmarty-amd It looks like the moe kernels failures are related

[2025-10-06T17:40:29Z] kernels/moe/test_batched_moe.py:306:
[2025-10-06T17:40:29Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-10-06T17:40:29Z] kernels/moe/utils.py:121: in naive_batched_moe
[2025-10-06T17:40:29Z]     NaiveBatchedExperts(
[2025-10-06T17:40:29Z] /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_batched_moe.py:643: in __init__
[2025-10-06T17:40:29Z]     assert self.quant_config.ocp_mx_scheme is None, "NYI"

Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@fxmarty-amd fxmarty-amd requested a review from mgoin October 6, 2025 23:11
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
@mgoin mgoin merged commit 41f1cf3 into vllm-project:main Oct 7, 2025
55 checks passed
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 7, 2025
@elvischenv
Copy link
Contributor

Why this PR could pass CI? The test file was renamed but not updated in the test list.

- pytest -v -s tests/kernels/moe/test_mxfp4_moe.py

https://buildkite.com/vllm/ci/builds/33836/steps/canvas?jid=0199bf13-7dcd-421a-8fac-d9bd04ad974d#0199bf13-7dcd-421a-8fac-d9bd04ad974d/90-1836

ERROR: file or directory not found: tests/kernels/moe/test_mxfp4_moe.py

@fxmarty-amd
Copy link
Contributor Author

fxmarty-amd commented Oct 7, 2025

@elvischenv Ohh, apologies, probably I should not have renamed it. Let me run it locally on H100. Do you see any failure?

@fxmarty-amd
Copy link
Contributor Author

cc @mgoin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues quantization qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants