Skip to content

[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl. #2363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

whx-sjtu
Copy link
Contributor

@whx-sjtu whx-sjtu commented Aug 14, 2025

In order to support fused kernels, multi-stream, communication optimization etc, it's better to aggregate all opreations in Attention layer togather. This PR tries to refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl.
Later I will provide a diagram showing the structure of refactored mla_v1.
Note that new mla_v1 doesn't take torchair into consideration. So this PR can only be merged after torchair related mla_v1 is isolated into a new file.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MLA v1 attention implementation by moving all preprocessing operations into the AscendMLAImpl class. This is a good architectural improvement that centralizes the attention logic and simplifies the model code. However, I've found two critical bugs in the new implementation in vllm_ascend/attention/mla_v1.py that will cause runtime errors due to incorrect function calls. Please see the detailed comments for fixes.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation module:core and removed merge-conflicts labels Aug 14, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
lwq and others added 2 commits August 18, 2025 16:53
Signed-off-by: lwq <liwenquan5@huawei.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lwq <liwenquan5@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation merge-conflicts module:core module:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant