add ucclep integration by CalebZ9909 · Pull Request #1 · uccl-project/vllm

CalebZ9909 · 2025-11-03T05:44:21Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

CalebZ9909 · 2025-11-05T02:23:02Z

I ran into a weird error: I triggered the CUDA Triton Error: [invalid argument] with the following messege:

[1;36m(EngineCore_DP6 pid=3065384)[0;0m   File "/fsx/ubuntu/uccl_yihan/vllm/vllm/model_executor/layers/fused_moe/fused_batched_moe.py", line 447, in invoke_moe_batched_triton_kernel
[1;36m(EngineCore_DP5 pid=3065380)[0;0m   File "/fsx/ubuntu/miniconda3/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     batched_triton_kernel[grid](
[1;36m(EngineCore_DP5 pid=3065380)[0;0m     return forward_call(*args, **kwargs)
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
[1;36m(EngineCore_DP5 pid=3065380)[0;0m   File "/fsx/ubuntu/uccl_yihan/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1160, in forward
[1;36m(EngineCore_DP6 pid=3065384)[0;0m         A,
[1;36m(EngineCore_DP5 pid=3065380)[0;0m     fused_out = self._fused_experts(
[1;36m(EngineCore_DP6 pid=3065384)[0;0m         ^^
[1;36m(EngineCore_DP5 pid=3065380)[0;0m         in_dtype=hidden_states.dtype,
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     ...<38 lines>...
[1;36m(EngineCore_DP5 pid=3065380)[0;0m     ...<11 lines>...
[1;36m(EngineCore_DP6 pid=3065384)[0;0m         BLOCK_K=BLOCK_K,
[1;36m(EngineCore_DP5 pid=3065380)[0;0m         expert_tokens_meta=expert_tokens_meta,
[1;36m(EngineCore_DP6 pid=3065384)[0;0m         ^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP5 pid=3065380)[0;0m     )
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     )
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     ^
[1;36m(EngineCore_DP5 pid=3065380)[0;0m   File "/fsx/ubuntu/uccl_yihan/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1013, in _fused_experts
[1;36m(EngineCore_DP6 pid=3065384)[0;0m   File "/fsx/ubuntu/miniconda3/lib/python3.13/site-packages/triton/runtime/jit.py", line 390, in <lambda>
[1;36m(EngineCore_DP5 pid=3065380)[0;0m     self.fused_experts.apply(
[1;36m(EngineCore_DP6 pid=3065384)[0;0m     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)

this continues going deep into the driver file that I did not paste here. I don't know why this happened (maybe some parameter issue?).
https://docs.vllm.ai/en/stable/serving/expert_parallel_deployment.html#configuration According to this, I set all of that, I am not sure if export VLLM_MOE_DP_CHUNK_SIZE = matters the memory size that triton will allocate, right now I just set it to be a number that is large enough,i am checking if it can cause some memory issue

add ucclep integration

cc8eb43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ucclep integration#1

add ucclep integration#1
CalebZ9909 wants to merge 1 commit intomainfrom
ucclep-integration

CalebZ9909 commented Nov 3, 2025

Uh oh!

CalebZ9909 commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CalebZ9909 commented Nov 3, 2025

Purpose

Test Plan

Test Result

Uh oh!

CalebZ9909 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CalebZ9909 commented Nov 5, 2025 •

edited

Loading