-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 #21166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for mxfp6 and mixed mxfp6-mxfp4 formats, which is a valuable enhancement. The code changes are well-structured, particularly the refactoring to generalize from MXFP4 to a broader OCP MX scheme.
However, there are a few critical points that need to be addressed before this can be merged:
- Missing Tests: As you've noted in the description, tests are missing. Given the complexity of quantization and MoE layers, adding comprehensive tests for the new mxfp6 and mixed-precision schemes is crucial to ensure correctness and prevent future regressions.
- Potential Logic Change: A check for dynamic activation quantization appears to have been removed. This could be a significant logic change and needs clarification.
- In-code
TODOs: There are severalTODOcomments in the code, indicating areas that might be incomplete or require verification. These should be resolved.
I've left specific comments on these points below. Addressing them will greatly improve the quality and reliability of this contribution.
vllm/model_executor/layers/quantization/quark/schemes/quark_ocp_mx.py
Outdated
Show resolved
Hide resolved
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
…t enum, fix dynamo issues Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
|
Hi @mgoin does this update sound good to you? Happy to get comments if you have some time. |
602d1e4 to
996ddd9
Compare
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
I reverted the change and prefixed with Tests are good locally. We should really run them on ROCm CI (if there is one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxmarty-amd It looks like the moe kernels failures are related
[2025-10-06T17:40:29Z] kernels/moe/test_batched_moe.py:306:
[2025-10-06T17:40:29Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-10-06T17:40:29Z] kernels/moe/utils.py:121: in naive_batched_moe
[2025-10-06T17:40:29Z] NaiveBatchedExperts(
[2025-10-06T17:40:29Z] /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_batched_moe.py:643: in __init__
[2025-10-06T17:40:29Z] assert self.quant_config.ocp_mx_scheme is None, "NYI"
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
04f661e to
fb0d1ac
Compare
|
Why this PR could pass CI? The test file was renamed but not updated in the test list. vllm/.buildkite/test-pipeline.yaml Line 832 in 320feae
|
|
@elvischenv Ohh, apologies, probably I should not have renamed it. Let me run it locally on H100. Do you see any failure? |
|
cc @mgoin |
…1166) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
As per title.
Support for MXFP4 in vllm was added in #16943 and #17888.
However, some hardware as AMD Instinct MI350/MI355 support as well math in mxfp6 or mixed fp4 / fp6 (see e.g. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-cdna4-instruction-set-architecture.pdf, with switches for the input dtype in V_MFMA_SCALE_F32_16X16X128_F8F6F4 and V_MFMA_SCALE_F32_32X32X64_F8F6F4), and adding support for it in vllm (simulated for now) is only a small stretch compared to current mxpf4 support.
Concerning fused MOE, passing
OCP_MX_Schemeas anEnumwould result inso I ended up passing it as a string instead, which is arguably not nice, but probably better than adding many
use_mxfp4_*: bool,use_mxfp6_*: bool. Eventually a refactor is probably needed there.Left to do: