Skip to content

Conversation

@varun-sundar-rabindranath
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath commented Oct 22, 2025

Purpose

Add triton_kernels from https://github.com/triton-lang/triton/tree/main/python/triton_kernels as a dependency and pin it to tag v3.5.0.

Why v3.5.0:
triton_kernels is just a sub-directory in the Triton repo. vLLM supports Torch2.9 now and Torch2.7 ships with the Triton 3.5.0.

Why add triton_kernels as a dependency?
We use the matmul_ogs function from triton_kernels for mxfp4 fused_moe operations on Hopper. At the moment, this code-path is the fastest for running mxfp4 models on Hopper. At the moment, users have to install triton_kernels manually to access this code-path, with this, users can use it out-of-the-box.

Test Plan

Tried a fresh-build locally and executed,
TP : vllm serve openai/gpt-oss-120b --tensor-parallel-size 2 --no-enable-prefix-caching
DP : VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-120b --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

Test Result

On hopper, both commands defaults to using the Triton implementation for Mxfp4.
both commands produce reasonable gpt_oss eval metrics.

Solve issue #26582

Varun Sundar Rabindranath added 2 commits October 22, 2025 18:30
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
@mergify mergify bot added the ci/build label Oct 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds triton_kernels as a dependency to support mxfp4 fused MoE operations. The change itself is correct in identifying the necessary package and version. However, there is a critical concern regarding the packaging for Docker. The new dependency is only added to requirements/cuda.txt, which may not be sufficient for it to be included in the final production Docker images. This could lead to runtime failures. I've added a comment with details on how to address this potential issue.

@varun-sundar-rabindranath
Copy link
Contributor Author

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
)
from vllm.model_executor.layers.fused_moe.fused_moe import fused_topk
from vllm.model_executor.layers.fused_moe.gpt_oss_triton_kernels_moe import (
BatchedOAITritonExperts,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchedOAITritonExperts was removed in PR #24588 and I missed removing it from the tests.
For context, eventhough matmul_ogs kernel from OpenAI Triton Kernels supports batched mode, it was removed as it is simply a dense gemm (does not mask invalid tokens) and not useful for the WideEP case.

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 23, 2025
@vllm-bot vllm-bot merged commit a9f55dc into vllm-project:main Oct 23, 2025
86 of 88 checks passed
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
@ywang96
Copy link
Member

ywang96 commented Oct 26, 2025

FYI - this is breaking the nightly installation

(new) coder@4xl40s devspaces (main) $ uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
Using Python 3.12.11 environment at: new
  × Failed to resolve dependencies for `vllm` (v0.11.1rc4.dev17+g361a7463d.cu129)
  ╰─▶ Package `triton-kernels` was included as a URL dependency. URL dependencies must be expressed as direct requirements or constraints.
      Consider adding `triton-kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels` to your
      dependencies or constraints file.

@cjackal
Copy link
Contributor

cjackal commented Oct 28, 2025

It seems this PR also blocks wheel install. (w/ the same uv pip install error message as above)

@mgoin
Copy link
Member

mgoin commented Oct 28, 2025

@varun-sundar-rabindranath can we revert for now to unblock? We can make our own wheel or copy the kernels over

@varun-sundar-rabindranath
Copy link
Contributor Author

Yes. I am reverting this now 👍

@varun-sundar-rabindranath
Copy link
Contributor Author

PR to revert the requirement here #27659
cc @mgoin @cjackal @ywang96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants