Skip to content

add super kernel for decode moe #2157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

NNUCJ
Copy link

@NNUCJ NNUCJ commented Aug 1, 2025

What this PR does / why we need it?

  Using super kernel feature to fuse some operators in the Moe stage, reducing scheduling overhead on devices. enable_super_kernel is valid only when Torchair graph mode and enable_multistream_moe is enabled.

Does this PR introduce any user-facing change?

How was this patch tested?

@NNUCJ NNUCJ force-pushed the super_kernel_main branch from b01c856 to 4b8a07d Compare August 1, 2025 03:10
@NNUCJ NNUCJ changed the title add super kernel for deocode moe add super kernel for decode moe Aug 1, 2025
Copy link

github-actions bot commented Aug 1, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 63.63636% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.39%. Comparing base (1a70564) to head (12d9aec).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/quantization/w8a8_dynamic.py 6.66% 14 Missing ⚠️
vllm_ascend/ops/fused_moe.py 83.33% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (63.63%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2157      +/-   ##
==========================================
+ Coverage   76.35%   76.39%   +0.04%     
==========================================
  Files         117      117              
  Lines       13371    13394      +23     
==========================================
+ Hits        10209    10233      +24     
+ Misses       3162     3161       -1     
Flag Coverage Δ
unittests 76.39% <63.63%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Aug 2, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@NNUCJ NNUCJ force-pushed the super_kernel_main branch from cbf7dc1 to 9e5c393 Compare August 2, 2025 01:57
@NNUCJ NNUCJ force-pushed the super_kernel_main branch 2 times, most recently from 112d01d to 32e6df8 Compare August 4, 2025 08:34
@NNUCJ NNUCJ force-pushed the super_kernel_main branch from 32e6df8 to 1524247 Compare August 4, 2025 11:20
@NNUCJ NNUCJ force-pushed the super_kernel_main branch 3 times, most recently from 709d83e to cd98f24 Compare August 6, 2025 01:20
@realliujiaxu
Copy link

Any benchmark result or profiling timeline?

@NNUCJ
Copy link
Author

NNUCJ commented Aug 6, 2025

Any benchmark result or profiling timeline?

  1. baseline
baseline 2. add super_kernel spuer_kernel

@NNUCJ NNUCJ force-pushed the super_kernel_main branch 5 times, most recently from b4e3fb1 to f1ee87d Compare August 6, 2025 08:50
Copy link

github-actions bot commented Aug 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@NNUCJ NNUCJ force-pushed the super_kernel_main branch from f1ee87d to 9aa52b0 Compare August 7, 2025 01:31
@NNUCJ NNUCJ force-pushed the super_kernel_main branch 4 times, most recently from d149510 to 4d64124 Compare August 8, 2025 01:27
@@ -315,15 +315,26 @@ def __init__(
self.enable_multistream_moe = \
ascend_config.torchair_graph_config.enable_multistream_moe and \
self.torchair_graph_enabled
self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can only TP1 use the super_kernel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the tp size is greater than 1, some stripdslice operators will be introduced in fusid_moe, which will interrupt the fusion of the super kernel

@@ -315,15 +315,26 @@ def __init__(
self.enable_multistream_moe = \
ascend_config.torchair_graph_config.enable_multistream_moe and \
self.torchair_graph_enabled
self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1
self.params_dtype = torch.get_default_dtype()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the following suggestion to avoid duplicated code?

Suggested change
self.params_dtype = torch.get_default_dtype()
self.params_dtype = torch.float32 if self.enable_super_kernel else torch.get_default_dtype()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion.

@@ -96,3 +97,7 @@ def npu_wait_tensor(self: torch.Tensor,
*,
enabled: bool = True):
return _npu_wait_tensor(self, dependency) if enabled else self


def super_kernel(prefix: str, stream: str, enabled: bool = True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def super_kernel(prefix: str, stream: str, enabled: bool = True):
def super_kernel(prefix: str, options: str, enabled: bool = True):

@NNUCJ NNUCJ force-pushed the super_kernel_main branch from 4d64124 to 8b7eab5 Compare August 12, 2025 01:51
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: NNUCJ <616151263@qq.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan
Copy link
Collaborator

please rebase to fix the merge conflict if this PR is still needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants