-
Notifications
You must be signed in to change notification settings - Fork 362
add super kernel for decode moe #2157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b01c856
to
4b8a07d
Compare
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
4b8a07d
to
cbf7dc1
Compare
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (63.63%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2157 +/- ##
==========================================
+ Coverage 76.35% 76.39% +0.04%
==========================================
Files 117 117
Lines 13371 13394 +23
==========================================
+ Hits 10209 10233 +24
+ Misses 3162 3161 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
cbf7dc1
to
9e5c393
Compare
112d01d
to
32e6df8
Compare
32e6df8
to
1524247
Compare
709d83e
to
cd98f24
Compare
Any benchmark result or profiling timeline? |
b4e3fb1
to
f1ee87d
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
f1ee87d
to
9aa52b0
Compare
d149510
to
4d64124
Compare
@@ -315,15 +315,26 @@ def __init__( | |||
self.enable_multistream_moe = \ | |||
ascend_config.torchair_graph_config.enable_multistream_moe and \ | |||
self.torchair_graph_enabled | |||
self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can only TP1 use the super_kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the tp size is greater than 1, some stripdslice operators will be introduced in fusid_moe, which will interrupt the fusion of the super kernel
vllm_ascend/models/deepseek_v2.py
Outdated
@@ -315,15 +315,26 @@ def __init__( | |||
self.enable_multistream_moe = \ | |||
ascend_config.torchair_graph_config.enable_multistream_moe and \ | |||
self.torchair_graph_enabled | |||
self.enable_super_kernel = self.enable_multistream_moe and self.tp_size == 1 | |||
self.params_dtype = torch.get_default_dtype() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the following suggestion to avoid duplicated code?
self.params_dtype = torch.get_default_dtype() | |
self.params_dtype = torch.float32 if self.enable_super_kernel else torch.get_default_dtype() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion.
vllm_ascend/torchair/utils.py
Outdated
@@ -96,3 +97,7 @@ def npu_wait_tensor(self: torch.Tensor, | |||
*, | |||
enabled: bool = True): | |||
return _npu_wait_tensor(self, dependency) if enabled else self | |||
|
|||
|
|||
def super_kernel(prefix: str, stream: str, enabled: bool = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def super_kernel(prefix: str, stream: str, enabled: bool = True): | |
def super_kernel(prefix: str, options: str, enabled: bool = True): |
4d64124
to
8b7eab5
Compare
8b7eab5
to
f88ed20
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: NNUCJ <616151263@qq.com>
f88ed20
to
12d9aec
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
please rebase to fix the merge conflict if this PR is still needed. |
What this PR does / why we need it?
Using super kernel feature to fuse some operators in the Moe stage, reducing scheduling overhead on devices. enable_super_kernel is valid only when Torchair graph mode and enable_multistream_moe is enabled.
Does this PR introduce any user-facing change?
How was this patch tested?