add super kernel for decode moe #2157

NNUCJ · 2025-08-01T03:04:54Z

What this PR does / why we need it?

Using super kernel feature to fuse some operators in the Moe stage, reducing scheduling overhead on devices. enable_super_kernel is valid only when Torchair graph mode and enable_multistream_moe is enabled.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@59f3b93

github-actions · 2025-08-01T04:16:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-08-01T06:37:10Z

Codecov Report

❌ Patch coverage is 63.63636% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.39%. Comparing base (1a70564) to head (12d9aec).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/quantization/w8a8_dynamic.py	6.66%	14 Missing ⚠️
vllm_ascend/ops/fused_moe.py	83.33%	2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (63.63%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2157      +/-   ##
==========================================
+ Coverage   76.35%   76.39%   +0.04%     
==========================================
  Files         117      117              
  Lines       13371    13394      +23     
==========================================
+ Hits        10209    10233      +24     
+ Misses       3162     3161       -1

Flag	Coverage Δ
unittests	`76.39% <63.63%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-08-02T01:53:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

vllm_ascend/models/deepseek_v2.py

realliujiaxu · 2025-08-06T02:24:37Z

Any benchmark result or profiling timeline？

NNUCJ · 2025-08-06T02:46:32Z

Any benchmark result or profiling timeline？

baseline

2. add super_kernel

github-actions · 2025-08-07T01:18:05Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

ApsarasX · 2025-08-11T07:09:13Z

vllm_ascend/models/deepseek_v2.py

@@ -315,15 +315,26 @@ def __init__(
        self.enable_multistream_moe = \
            ascend_config.torchair_graph_config.enable_multistream_moe and \
            self.torchair_graph_enabled
+        self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1


Why can only TP1 use the super_kernel?

When the tp size is greater than 1, some stripdslice operators will be introduced in fusid_moe, which will interrupt the fusion of the super kernel

ApsarasX · 2025-08-11T07:18:06Z

vllm_ascend/models/deepseek_v2.py

@@ -315,15 +315,26 @@ def __init__(
        self.enable_multistream_moe = \
            ascend_config.torchair_graph_config.enable_multistream_moe and \
            self.torchair_graph_enabled
+        self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1
+        self.params_dtype = torch.get_default_dtype()


use the following suggestion to avoid duplicated code?

Suggested change

self.params_dtype = torch.get_default_dtype()

self.params_dtype = torch.float32 if self.enable_super_kernel else torch.get_default_dtype()

Thank you for your suggestion.

ApsarasX · 2025-08-11T07:20:45Z

vllm_ascend/torchair/utils.py

@@ -96,3 +97,7 @@ def npu_wait_tensor(self: torch.Tensor,
                    *,
                    enabled: bool = True):
    return _npu_wait_tensor(self, dependency) if enabled else self
+
+
+def super_kernel(prefix: str, stream: str, enabled: bool = True):


Suggested change

def super_kernel(prefix: str, stream: str, enabled: bool = True):

def super_kernel(prefix: str, options: str, enabled: bool = True):

github-actions · 2025-08-12T06:14:27Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: NNUCJ <616151263@qq.com>

github-actions · 2025-08-14T03:54:23Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-08-19T03:03:17Z

please rebase to fix the merge conflict if this PR is still needed.

NNUCJ force-pushed the super_kernel_main branch from b01c856 to 4b8a07d Compare August 1, 2025 03:10

NNUCJ changed the title ~~add super kernel for deocode moe~~ add super kernel for decode moe Aug 1, 2025

github-actions bot added module:ops module:core module:quantization labels Aug 1, 2025

NNUCJ force-pushed the super_kernel_main branch from 4b8a07d to cbf7dc1 Compare August 1, 2025 06:12

github-actions bot added the merge-conflicts label Aug 2, 2025

NNUCJ force-pushed the super_kernel_main branch from cbf7dc1 to 9e5c393 Compare August 2, 2025 01:57

github-actions bot removed the merge-conflicts label Aug 2, 2025

NNUCJ force-pushed the super_kernel_main branch 2 times, most recently from 112d01d to 32e6df8 Compare August 4, 2025 08:34

github-actions bot added the module:tests label Aug 4, 2025

NNUCJ force-pushed the super_kernel_main branch from 32e6df8 to 1524247 Compare August 4, 2025 11:20

github-actions bot removed the module:core label Aug 4, 2025

NNUCJ force-pushed the super_kernel_main branch 3 times, most recently from 709d83e to cd98f24 Compare August 6, 2025 01:20

realliujiaxu reviewed Aug 6, 2025

View reviewed changes

vllm_ascend/models/deepseek_v2.py Outdated Show resolved Hide resolved

NNUCJ force-pushed the super_kernel_main branch 5 times, most recently from b4e3fb1 to f1ee87d Compare August 6, 2025 08:50

github-actions bot added the merge-conflicts label Aug 7, 2025

NNUCJ force-pushed the super_kernel_main branch from f1ee87d to 9aa52b0 Compare August 7, 2025 01:31

github-actions bot removed the merge-conflicts label Aug 7, 2025

NNUCJ force-pushed the super_kernel_main branch 4 times, most recently from d149510 to 4d64124 Compare August 8, 2025 01:27

ApsarasX reviewed Aug 11, 2025

View reviewed changes

NNUCJ force-pushed the super_kernel_main branch from 4d64124 to 8b7eab5 Compare August 12, 2025 01:51

ApsarasX approved these changes Aug 12, 2025

View reviewed changes

NNUCJ force-pushed the super_kernel_main branch from 8b7eab5 to f88ed20 Compare August 12, 2025 03:30

github-actions bot added the merge-conflicts label Aug 12, 2025

add super kernel for deocode moe

12d9aec

Signed-off-by: NNUCJ <616151263@qq.com>

NNUCJ force-pushed the super_kernel_main branch from f88ed20 to 12d9aec Compare August 12, 2025 08:31

github-actions bot removed the merge-conflicts label Aug 12, 2025

github-actions bot added the merge-conflicts label Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add super kernel for decode moe #2157

add super kernel for decode moe #2157

NNUCJ commented Aug 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

codecov bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

Uh oh!

realliujiaxu commented Aug 6, 2025

Uh oh!

NNUCJ commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

ApsarasX Aug 11, 2025

Uh oh!

NNUCJ Aug 12, 2025

Uh oh!

ApsarasX Aug 11, 2025

Uh oh!

NNUCJ Aug 12, 2025

Uh oh!

ApsarasX Aug 11, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

wangxiyuan commented Aug 19, 2025

Uh oh!

Uh oh!

	self.params_dtype = torch.get_default_dtype()
	self.params_dtype = torch.float32 if self.enable_super_kernel else torch.get_default_dtype()

	def super_kernel(prefix: str, stream: str, enabled: bool = True):
	def super_kernel(prefix: str, options: str, enabled: bool = True):

add super kernel for decode moe #2157

Are you sure you want to change the base?

add super kernel for decode moe #2157

Conversation

NNUCJ commented Aug 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

codecov bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

Uh oh!

realliujiaxu commented Aug 6, 2025

Uh oh!

NNUCJ commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

ApsarasX Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

NNUCJ Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

NNUCJ Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

wangxiyuan commented Aug 19, 2025

Uh oh!

Uh oh!

NNUCJ commented Aug 1, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 1, 2025 •

edited

Loading