[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

whx-sjtu · 2025-08-14T02:41:38Z

In order to support fused kernels, multi-stream, communication optimization etc, it's better to aggregate all opreations in Attention layer togather. This PR tries to refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl.
Later I will provide a diagram showing the structure of refactored mla_v1.
Note that new mla_v1 doesn't take torchair into consideration. So this PR can only be merged after torchair related mla_v1 is isolated into a new file.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@d983769

gemini-code-assist

Code Review

This pull request refactors the MLA v1 attention implementation by moving all preprocessing operations into the AscendMLAImpl class. This is a good architectural improvement that centralizes the attention logic and simplifies the model code. However, I've found two critical bugs in the new implementation in vllm_ascend/attention/mla_v1.py that will cause runtime errors due to incorrect function calls. Please see the detailed comments for fixes.

vllm_ascend/attention/mla_v1.py

github-actions · 2025-08-14T02:44:10Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-08-14T02:44:45Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: whx-sjtu <2952154980@qq.com>

Signed-off-by: lwq <liwenquan5@huawei.com>

Signed-off-by: whx-sjtu <2952154980@qq.com>

github-actions · 2025-08-20T01:03:12Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lwq <liwenquan5@huawei.com>

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

vllm_ascend/attention/mla_v1.py Outdated Show resolved Hide resolved

vllm_ascend/attention/mla_v1.py Outdated Show resolved Hide resolved

github-actions bot added the merge-conflicts label Aug 14, 2025

whx-sjtu force-pushed the refact_mla_0813 branch from 2de99bb to 51df004 Compare August 14, 2025 02:54

whx-sjtu added 2 commits August 14, 2025 10:56

refact mla 1

e1cef4e

Signed-off-by: whx-sjtu <2952154980@qq.com>

decrease number of attention ops in prefill and decode

6fad01b

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the refact_mla_0813 branch from 51df004 to 6fad01b Compare August 14, 2025 02:58

github-actions bot added documentation Improvements or additions to documentation module:core and removed merge-conflicts labels Aug 14, 2025

whx-sjtu force-pushed the refact_mla_0813 branch from 3f167fd to 4439176 Compare August 14, 2025 11:41

fix mla output shape bug

a8ee10d

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the refact_mla_0813 branch from 4439176 to a8ee10d Compare August 14, 2025 12:22

remodulize mla_v1 forward

6ea8c1c

Signed-off-by: whx-sjtu <2952154980@qq.com>

github-actions bot added the module:tests label Aug 15, 2025

lwq and others added 2 commits August 18, 2025 16:53

fix bugs

6947b45

Signed-off-by: lwq <liwenquan5@huawei.com>

fix mtp eager-mode accu bug

776d0c9

Signed-off-by: whx-sjtu <2952154980@qq.com>

github-actions bot added the merge-conflicts label Aug 20, 2025

bugfix:fix mtp slice bug

237ba2b

Signed-off-by: lwq <liwenquan5@huawei.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

whx-sjtu commented Aug 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

Are you sure you want to change the base?

[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

Conversation

whx-sjtu commented Aug 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

whx-sjtu commented Aug 14, 2025 •

edited by github-actions bot

Loading