-
Notifications
You must be signed in to change notification settings - Fork 362
[2/N][refactor] torchair deepseek mla backend refactor #2459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/N][refactor] torchair deepseek mla backend refactor #2459
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the attention backend selection logic and introduces a new TorchAir MLA backend for DeepSeek models on Ascend NPUs. The refactoring in platform.py
correctly handles the selection of different attention backends. The new implementation in vllm_ascend/torchair/torchair_mla.py
adds the TorchAir-based MLA backend. I've found a critical issue in the decode path of this new implementation that needs to be addressed.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
9d75ec4
to
5c278a8
Compare
6ad2f4d
to
0b5b5b2
Compare
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (79.62%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2459 +/- ##
==========================================
+ Coverage 76.18% 77.56% +1.37%
==========================================
Files 120 130 +10
Lines 13532 17149 +3617
==========================================
+ Hits 10310 13302 +2992
- Misses 3222 3847 +625
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
582e3b8
to
9b0233d
Compare
afd1b79
to
2d4e437
Compare
Signed-off-by: linfeng-yuan <1102311262@qq.com>
2d4e437
to
d8671e8
Compare
CI failue doesn't relate to this PR |
What this PR does / why we need it?
This PR move current unified mla backend to torchair folder and remove torchair-related code in attention/mla_v1.py (1.3k -> 0.9k).
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Running eager mode with mla backend, and torchair mode with code before 2445