-
Notifications
You must be signed in to change notification settings - Fork 542
[Refactor][main CI] Refactor code to align with vllm main #3504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor][main CI] Refactor code to align with vllm main #3504
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the codebase to align with upstream vllm main, introducing version compatibility checks across multiple files. The changes are extensive and mostly correct. However, I've identified a critical issue in a test case that could break CI for a specific vllm version, and a code duplication issue that should be addressed for better maintainability.
| self.assertEqual( | ||
| vllm_config.compilation_config.level, | ||
| CompilationLevel.NO_COMPILATION, | ||
| CompilationMode.NONE, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion for vllm_config.compilation_config.level is incorrect when vllm_version_is("0.11.0") is true. The test will fail in that case. The assertion should be conditional, similar to other tests in this file, to check for CompilationLevel.NO_COMPILATION for vllm 0.11.0 and CompilationMode.NONE otherwise.
if vllm_version_is("0.11.0"):
self.assertEqual(
vllm_config.compilation_config.level,
CompilationLevel.NO_COMPILATION,
)
else:
self.assertEqual(
vllm_config.compilation_config.level,
CompilationMode.NONE,
)| # In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs. | ||
| # If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size. | ||
| if self.use_aclgraph and enable_sp(self.vllm_config): | ||
| tp_size = self.vllm_config.parallel_config.tensor_parallel_size | ||
| num_tokens = math.ceil(num_tokens / tp_size) * tp_size | ||
|
|
||
| # In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs. | ||
| # If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size. | ||
| if self.use_aclgraph and enable_sp(self.vllm_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code, which pads num_tokens for sequence parallelism, appears to be duplicated. The second block should be removed to avoid redundancy.
| # In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs. | |
| # If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size. | |
| if self.use_aclgraph and enable_sp(self.vllm_config): | |
| tp_size = self.vllm_config.parallel_config.tensor_parallel_size | |
| num_tokens = math.ceil(num_tokens / tp_size) * tp_size | |
| # In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs. | |
| # If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size. | |
| if self.use_aclgraph and enable_sp(self.vllm_config): | |
| # In multi-DP scenarios, there may be situations where all DP groups are executing dummy runs. | |
| # If sequence parallelism is enabled, it is essential to ensure that num_tokens is divisible by tp_size. | |
| if self.use_aclgraph and enable_sp(self.vllm_config): | |
| tp_size = self.vllm_config.parallel_config.tensor_parallel_size | |
| num_tokens = math.ceil(num_tokens / tp_size) * tp_size | |
| if self.use_aclgraph and enable_sp(self.vllm_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The adaption of MLA part LGTM. It's ok for me to delete deepseek_v2 modeling after this PR merged.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
b8e0d5e to
85b1d42
Compare
71aeb39 to
1ebbd54
Compare
|
torchair mtp failed in
test_models_distributed_Qwen_Dense_with_flashcomm_v1 failed in https://github.com/vllm-project/vllm-ascend/actions/runs/18611921289/job/53071268128?pr=3504 tp+ep failed due to dispatcher to different communication ops https://github.com/vllm-project/vllm-ascend/actions/runs/18616600778/job/53081964431?pr=3504 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
366fa84 to
a1038da
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
* fix bert model * fix guided decoding * revert skipped e2e test * fix lora vllm-project/vllm#25807 * fix vl Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
a1038da to
b6a9207
Compare
|
e2e tests on singlecard only failed with test_mtp_torchair_correctness_piecewise and test_mtp_torchair_correctness_full in https://github.com/vllm-project/vllm-ascend/actions/runs/18616600778/job/53081964430?pr=3504 |
Signed-off-by: MengqingCao <cmq0113@163.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: MengqingCao <cmq0113@163.com>
|
closing as done in #3612 |
What this PR does / why we need it?
How was this patch tested?
Test locally passed with AscendScheduler enabled + torchair/eager mode on
DeepSeek-V3.2-Exp-W8A8