Skip to content

Conversation

@bnellnm
Copy link
Contributor

@bnellnm bnellnm commented Jul 31, 2025

Purpose

  • Pass FusedMoEConfig to all FusedMoEMethodBase object constructors
  • Make sure self.fused_experts is set to None in the constructor and only set once by init_prepare_finalize.
  • Make sure topk_indices_dtype is initialized to None and used in all select_experts calls
  • Remove extra_* arguments to modular kernels by capturing relevant parameters at construction time.
  • Allow FusedMoEMethodBase subclasses to select prepare/finalize objects before deferring to the default mechanism.
  • Call init_prepare_and_finalize whenever EP is enabled whether or not DP > 1.
  • Added csrc/quantization/cutlass_w8a8/moe to build kite YAML file.
  • refactor modular kernel tests to handle nvfp4 and make it easier to add new classes/types.
  • add flashinfer tests to test_modular_kernel_combinations.py
  • add test_flashinfer_moe.py test for FlashInferExperts

A follow up PR will capture more parameters at construction time to reduce the size of the apply/forward argument lists.

Test plan

  • Ran tests/kernels/moe
  • Ran nvidia/DeepSeek-R1-FP4 with different combinations of DP>=1 and TP>=1. Checked lm_eval results.

Test Result

cc @varun-sundar-rabindranath , @ElizaWszola

@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Jul 31, 2025
@mergify
Copy link

mergify bot commented Jul 31, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable refactoring of the Fused MoE (Mixture of Experts) kernels. The core of the change is the cleanup of FusedMoeMethodBase and the removal of extra_..._args from various modular kernel methods. This is achieved by moving configuration into instance attributes, primarily through the new __init__ method in FusedMoEMethodBase which now takes a FusedMoEConfig object. This change greatly improves code clarity and maintainability by adopting a more object-oriented approach.

I've found one critical issue where the new logic could lead to an AssertionError and a crash under specific configurations related to FlashInfer kernels. A fix has been suggested. Besides this, the refactoring appears solid and consistent across the codebase.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@bnellnm bnellnm requested a review from hmellor as a code owner August 1, 2025 16:50
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and much needed set of cleanups !! Thanks @bnellnm

@mergify
Copy link

mergify bot commented Aug 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 8, 2025
@bnellnm bnellnm force-pushed the refactor branch 2 times, most recently from a8a9e86 to 0bff038 Compare August 8, 2025 15:48
@mergify mergify bot removed the needs-rebase label Aug 8, 2025
@mergify
Copy link

mergify bot commented Aug 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Bill, looking forward to the follow up packaging!

@ProExpertProg ProExpertProg merged commit 8ad7285 into vllm-project:main Aug 15, 2025
72 checks passed
666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…e extra arguments from modular kernel methods. (vllm-project#22035)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
@bnellnm bnellnm deleted the refactor branch September 20, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants