Skip to content

Conversation

@MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Oct 16, 2025

What this PR does / why we need it?

  1. cherry-pick [CI] Upgrade vllm to newest commit #3423
  2. refactor deepseek code
    • seprate modeling code deepseek v3.2 with other deepseek version
    • using the latest MLA code arch in vLLM

How was this patch tested?

Test locally passed with AscendScheduler enabled + torchair/eager mode on DeepSeek-V3.2-Exp-W8A8

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase to align with upstream vllm main, introducing version compatibility checks across multiple files. The changes are extensive and mostly correct. However, I've identified a critical issue in a test case that could break CI for a specific vllm version, and a code duplication issue that should be addressed for better maintainability.

Comment on lines 344 to 355
self.assertEqual(
vllm_config.compilation_config.level,
CompilationLevel.NO_COMPILATION,
CompilationMode.NONE,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The assertion for vllm_config.compilation_config.level is incorrect when vllm_version_is("0.11.0") is true. The test will fail in that case. The assertion should be conditional, similar to other tests in this file, to check for CompilationLevel.NO_COMPILATION for vllm 0.11.0 and CompilationMode.NONE otherwise.

            if vllm_version_is("0.11.0"):
                self.assertEqual(
                    vllm_config.compilation_config.level,
                    CompilationLevel.NO_COMPILATION,
                )
            else:
                self.assertEqual(
                    vllm_config.compilation_config.level,
                    CompilationMode.NONE,
                )

Comment on lines +2340 to 2531
# In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs.
# If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size.
if self.use_aclgraph and enable_sp(self.vllm_config):
tp_size = self.vllm_config.parallel_config.tensor_parallel_size
num_tokens = math.ceil(num_tokens / tp_size) * tp_size

# In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs.
# If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size.
if self.use_aclgraph and enable_sp(self.vllm_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of code, which pads num_tokens for sequence parallelism, appears to be duplicated. The second block should be removed to avoid redundancy.

Suggested change
# In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs.
# If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size.
if self.use_aclgraph and enable_sp(self.vllm_config):
tp_size = self.vllm_config.parallel_config.tensor_parallel_size
num_tokens = math.ceil(num_tokens / tp_size) * tp_size
# In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs.
# If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size.
if self.use_aclgraph and enable_sp(self.vllm_config):
# In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs.
# If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size.
if self.use_aclgraph and enable_sp(self.vllm_config):
tp_size = self.vllm_config.parallel_config.tensor_parallel_size
num_tokens = math.ceil(num_tokens / tp_size) * tp_size
if self.use_aclgraph and enable_sp(self.vllm_config):

Copy link
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adaption of MLA part LGTM. It's ok for me to delete deepseek_v2 modeling after this PR merged.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@MengqingCao MengqingCao force-pushed the refactor_ds_1016_rebase_main branch from b8e0d5e to 85b1d42 Compare October 17, 2025 12:09
@MengqingCao MengqingCao added ready read for review ready-for-test start test by label for PR labels Oct 17, 2025
@MengqingCao MengqingCao force-pushed the refactor_ds_1016_rebase_main branch from 71aeb39 to 1ebbd54 Compare October 18, 2025 06:19
@MengqingCao
Copy link
Collaborator Author

MengqingCao commented Oct 18, 2025

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@MengqingCao MengqingCao force-pushed the refactor_ds_1016_rebase_main branch from 366fa84 to a1038da Compare October 19, 2025 06:48
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao and others added 9 commits October 20, 2025 03:51
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
  * fix bert model
  * fix guided decoding
  * revert skipped e2e test
  * fix lora vllm-project/vllm#25807
  * fix vl

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
@MengqingCao
Copy link
Collaborator Author

e2e tests on singlecard only failed with test_mtp_torchair_correctness_piecewise and test_mtp_torchair_correctness_full

2025-10-18T15:07:07.2022784Z tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_torchair_correctness.py::test_mtp_torchair_correctness_piecewise SKIPPED
2025-10-18T15:07:09.4163583Z tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_torchair_correctness.py::test_mtp_torchair_correctness_full SKIPPED

in https://github.com/vllm-project/vllm-ascend/actions/runs/18616600778/job/53081964430?pr=3504

Signed-off-by: MengqingCao <cmq0113@163.com>
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MengqingCao <cmq0113@163.com>
@MengqingCao
Copy link
Collaborator Author

closing as done in #3612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants