- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. #22035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| This pull request has merge conflicts that must be resolved before it can be | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and valuable refactoring of the Fused MoE (Mixture of Experts) kernels. The core of the change is the cleanup of FusedMoeMethodBase and the removal of extra_..._args from various modular kernel methods. This is achieved by moving configuration into instance attributes, primarily through the new __init__ method in FusedMoEMethodBase which now takes a FusedMoEConfig object. This change greatly improves code clarity and maintainability by adopting a more object-oriented approach.
I've found one critical issue where the new logic could lead to an AssertionError and a crash under specific configurations related to FlashInfer kernels. A fix has been suggested. Besides this, the refactoring appears solid and consistent across the codebase.
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
        
          
                vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and much needed set of cleanups !! Thanks @bnellnm
| This pull request has merge conflicts that must be resolved before it can be | 
a8a9e86    to
    0bff038      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Bill, looking forward to the follow up packaging!
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…e extra arguments from modular kernel methods. (vllm-project#22035) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
Purpose
FusedMoEConfigto allFusedMoEMethodBaseobject constructorsself.fused_expertsis set to None in the constructor and only set once byinit_prepare_finalize.topk_indices_dtypeis initialized to None and used in all select_experts callsextra_*arguments to modular kernels by capturing relevant parameters at construction time.FusedMoEMethodBasesubclasses to select prepare/finalize objects before deferring to the default mechanism.init_prepare_and_finalizewhenever EP is enabled whether or not DP > 1.csrc/quantization/cutlass_w8a8/moeto build kite YAML file.test_modular_kernel_combinations.pytest_flashinfer_moe.pytest forFlashInferExpertsA follow up PR will capture more parameters at construction time to reduce the size of the
apply/forwardargument lists.Test plan
tests/kernels/moenvidia/DeepSeek-R1-FP4with different combinations of DP>=1 and TP>=1. Checked lm_eval results.Test Result
nvidia/DeepSeek-R1-FP4DP=4, TP=1, EP=True the engine seems to lock up. Verified this happens on main also as of 9266d98. I think this might be fixed by Fix Flashinfer CUTLASS MOE Allgather #21963cc @varun-sundar-rabindranath , @ElizaWszola