-
-
Couldn't load subscription status.
- Fork 10.8k
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP #24588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Triton matmul-ogs kernels for GPT-OSS models with Data Parallelism and Expert Parallelism, specifically for mxfp4 quantization. This is done by adding a new OAITritonExperts class. The PR also includes substantial changes to dependency management, removing pinned versions of PyTorch and related packages across various requirement files. My review identified some leftover debugging print statements that should be removed before merging.
vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py
Outdated
Show resolved
Hide resolved
vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py
Outdated
Show resolved
Hide resolved
|
Status: command : getting an IMA during model startup |
commit |
|
Update. It appears DeepEP kernels dont support the gpt-oss hidden-size (2880). I have a PR here in the DeepEP repo to fix this deepseek-ai/DeepEP#408 . I see correctness with this. |
d791766 to
8579461
Compare
7bb98dc to
584fdf2
Compare
|
You can add padding here so we don't need to change DeepEP side. vllm/vllm/model_executor/layers/fused_moe/layer.py Lines 837 to 848 in 7ae9887
|
|
This pull request has merge conflicts that must be resolved before it can be |
584fdf2 to
a87c094
Compare
Nice Idea 🙌 I have updated the code to add this padding. PTAL! Thanks 👍 |
vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py
Outdated
Show resolved
Hide resolved
| if hidden_size_bytes % 512 == 0: | ||
| return hidden_size | ||
|
|
||
| hidden_size_bytes = round_up(hidden_size_bytes, xfer_atom_size) | ||
| return hidden_size_bytes // dtype.itemsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be replaced with cdiv(hidden_size_bytes, xfer_atom_size)?
vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py
Outdated
Show resolved
Hide resolved
|
This pull request has merge conflicts that must be resolved before it can be |
| else: | ||
| # TODO (bnell): This is a hack to get test_mixtral_moe to work | ||
| # since model_config is not set in the pytest test. | ||
| moe_in_dtype = params_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have little clue beyond this
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
cef84f5 to
5756685
Compare
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: gaojc <1055866782@qq.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
…project#24588) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ct#25481) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Enable DP/EP for gpt-oss model via DeepEPHightThroughput and triton_kernel's matmul_ogs
Test Plan
Run gpt-oss eval from link
Test Result
<style type="text/css"></style>
Benchmark
Please find benchmark results here
TLDR - Good TTFT numbers. Bad TPOT numbers as there is no cuda graphs for decode.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.