[Misc] Add triton_kernels dependency #27370

varun-sundar-rabindranath · 2025-10-22T21:18:42Z

Purpose

Add triton_kernels from https://github.com/triton-lang/triton/tree/main/python/triton_kernels as a dependency and pin it to tag v3.5.0.

Why v3.5.0:
triton_kernels is just a sub-directory in the Triton repo. vLLM supports Torch2.9 now and Torch2.7 ships with the Triton 3.5.0.

Why add triton_kernels as a dependency?
We use the matmul_ogs function from triton_kernels for mxfp4 fused_moe operations on Hopper. At the moment, this code-path is the fastest for running mxfp4 models on Hopper. At the moment, users have to install triton_kernels manually to access this code-path, with this, users can use it out-of-the-box.

Test Plan

Tried a fresh-build locally and executed,
TP : vllm serve openai/gpt-oss-120b --tensor-parallel-size 2 --no-enable-prefix-caching
DP : VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-120b --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

Test Result

On hopper, both commands defaults to using the Triton implementation for Mxfp4.
both commands produce reasonable gpt_oss eval metrics.

Solve issue #26582

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

gemini-code-assist

Code Review

This pull request adds triton_kernels as a dependency to support mxfp4 fused MoE operations. The change itself is correct in identifying the necessary package and version. However, there is a critical concern regarding the packaging for Docker. The new dependency is only added to requirements/cuda.txt, which may not be sufficient for it to be included in the final production Docker images. This could lead to runtime failures. I've added a comment with details on how to address this potential issue.

requirements/cuda.txt

varun-sundar-rabindranath · 2025-10-22T21:42:02Z

cc @zyongye @mgoin @WoosukKwon @simon-mo

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath · 2025-10-23T03:01:37Z

tests/kernels/moe/test_gpt_oss_triton_kernels.py

-)
-from vllm.model_executor.layers.fused_moe.fused_moe import fused_topk
 from vllm.model_executor.layers.fused_moe.gpt_oss_triton_kernels_moe import (
-    BatchedOAITritonExperts,


BatchedOAITritonExperts was removed in PR #24588 and I missed removing it from the tests.
For context, eventhough matmul_ogs kernel from OpenAI Triton Kernels supports batched mode, it was removed as it is simply a dense gemm (does not mask invalid tokens) and not useful for the WideEP case.

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

ywang96 · 2025-10-26T20:21:27Z

FYI - this is breaking the nightly installation

(new) coder@4xl40s devspaces (main) $ uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
Using Python 3.12.11 environment at: new
  × Failed to resolve dependencies for `vllm` (v0.11.1rc4.dev17+g361a7463d.cu129)
  ╰─▶ Package `triton-kernels` was included as a URL dependency. URL dependencies must be expressed as direct requirements or constraints.
      Consider adding `triton-kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels` to your
      dependencies or constraints file.

cjackal · 2025-10-28T09:38:37Z

It seems this PR also blocks wheel install. (w/ the same uv pip install error message as above)

mgoin · 2025-10-28T09:44:12Z

@varun-sundar-rabindranath can we revert for now to unblock? We can make our own wheel or copy the kernels over

varun-sundar-rabindranath · 2025-10-28T13:53:29Z

Yes. I am reverting this now 👍

varun-sundar-rabindranath · 2025-10-28T14:01:40Z

PR to revert the requirement here #27659
cc @mgoin @cjackal @ywang96

Varun Sundar Rabindranath added 2 commits October 22, 2025 18:30

add triton_kernels to cuda.txt

19e24a1

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

add comment

20667fd

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

mergify bot added the ci/build label Oct 22, 2025

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

requirements/cuda.txt Show resolved Hide resolved

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025

remove unused BatchedOAITritonExpert

25dddb6

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath requested review from WoosukKwon, tlrmchlsmth and yewentao256 as code owners October 23, 2025 02:53

mergify bot added the gpt-oss Related to GPT-OSS models label Oct 23, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 23, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 23, 2025

varun-sundar-rabindranath commented Oct 23, 2025

View reviewed changes

fixes

264c8b2

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

mgoin approved these changes Oct 23, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 23, 2025

vllm-bot merged commit a9f55dc into vllm-project:main Oct 23, 2025
86 of 88 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Oct 23, 2025

varun-sundar-rabindranath mentioned this pull request Oct 24, 2025

[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement #27487

Merged

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Misc] Add triton_kernels dependency (vllm-project#27370)

d069d34

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath mentioned this pull request Oct 28, 2025

[Build] Revert triton_kernels requirements #27659

Merged

varun-sundar-rabindranath mentioned this pull request Oct 28, 2025

[Feature]: Adding triton_kernels from Triton repo as a dependency #27672

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Add triton_kernels dependency #27370

[Misc] Add triton_kernels dependency #27370

varun-sundar-rabindranath commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Oct 22, 2025

Uh oh!

varun-sundar-rabindranath Oct 23, 2025

Uh oh!

Uh oh!

ywang96 commented Oct 26, 2025

Uh oh!

cjackal commented Oct 28, 2025

Uh oh!

mgoin commented Oct 28, 2025

Uh oh!

varun-sundar-rabindranath commented Oct 28, 2025

Uh oh!

varun-sundar-rabindranath commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Misc] Add triton_kernels dependency #27370

[Misc] Add triton_kernels dependency #27370

Conversation

varun-sundar-rabindranath commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Oct 22, 2025

Uh oh!

varun-sundar-rabindranath Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywang96 commented Oct 26, 2025

Uh oh!

cjackal commented Oct 28, 2025

Uh oh!

mgoin commented Oct 28, 2025

Uh oh!

varun-sundar-rabindranath commented Oct 28, 2025

Uh oh!

varun-sundar-rabindranath commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

varun-sundar-rabindranath commented Oct 22, 2025 •

edited by github-actions bot

Loading