[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. #22035

bnellnm · 2025-07-31T21:04:25Z

Purpose

Pass FusedMoEConfig to all FusedMoEMethodBase object constructors
Make sure self.fused_experts is set to None in the constructor and only set once by init_prepare_finalize.
Make sure topk_indices_dtype is initialized to None and used in all select_experts calls
Remove extra_* arguments to modular kernels by capturing relevant parameters at construction time.
Allow FusedMoEMethodBase subclasses to select prepare/finalize objects before deferring to the default mechanism.
Call init_prepare_and_finalize whenever EP is enabled whether or not DP > 1.
Added csrc/quantization/cutlass_w8a8/moe to build kite YAML file.
refactor modular kernel tests to handle nvfp4 and make it easier to add new classes/types.
add flashinfer tests to test_modular_kernel_combinations.py
add test_flashinfer_moe.py test for FlashInferExperts

A follow up PR will capture more parameters at construction time to reduce the size of the apply/forward argument lists.

Test plan

Ran tests/kernels/moe
Ran nvidia/DeepSeek-R1-FP4 with different combinations of DP>=1 and TP>=1. Checked lm_eval results.

Test Result

test_pplx_cutlass_moe.py failing on main due to swap_ab issue from [Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm #20911.
When running nvidia/DeepSeek-R1-FP4 DP=4, TP=1, EP=True the engine seems to lock up. Verified this happens on main also as of 9266d98. I think this might be fixed by Fix Flashinfer CUTLASS MOE Allgather #21963

cc @varun-sundar-rabindranath , @ElizaWszola

mergify · 2025-07-31T21:05:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring of the Fused MoE (Mixture of Experts) kernels. The core of the change is the cleanup of FusedMoeMethodBase and the removal of extra_..._args from various modular kernel methods. This is achieved by moving configuration into instance attributes, primarily through the new __init__ method in FusedMoEMethodBase which now takes a FusedMoEConfig object. This change greatly improves code clarity and maintainability by adopting a more object-oriented approach.

I've found one critical issue where the new logic could lead to an AssertionError and a crash under specific configurations related to FlashInfer kernels. A fix has been suggested. Besides this, the refactoring appears solid and consistent across the codebase.

vllm/model_executor/layers/quantization/modelopt.py

github-actions · 2025-07-31T21:10:00Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

tests/kernels/moe/modular_kernel_tools/common.py

tests/kernels/moe/modular_kernel_tools/utils.py

vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py

vllm/model_executor/layers/fused_moe/layer.py

vllm/model_executor/layers/fused_moe/modular_kernel.py

vllm/model_executor/layers/fused_moe/prepare_finalize.py

varun-sundar-rabindranath

Very nice and much needed set of cleanups !! Thanks @bnellnm

mergify · 2025-08-08T02:26:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-08-08T18:41:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <bnell@redhat.com>

mgoin

Great work Bill, looking forward to the follow up packaging!

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

bnellnm requested review from WoosukKwon, mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners July 31, 2025 21:04

mergify bot added documentation Improvements or additions to documentation ci/build labels Jul 31, 2025

mergify bot added the needs-rebase label Jul 31, 2025

gemini-code-assist bot reviewed Jul 31, 2025

View reviewed changes

vllm/model_executor/layers/quantization/modelopt.py Show resolved Hide resolved

bnellnm requested a review from hmellor as a code owner August 1, 2025 16:50