-
Notifications
You must be signed in to change notification settings - Fork 531
[1/N][Refactor] Refactor code to adapt with vllm main #3612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the code to adapt with the vllm main branch. The changes include removing an unused pre-commit hook, adding version-dependent logic for scheduler initialization and compilation level, and updating attention mechanisms for compatibility with different vllm versions. The review focuses on identifying potential issues related to version compatibility and code maintainability, specifically targeting high and critical severity issues.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1 similar comment
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
5a6a209 to
d606f30
Compare
| def version_check(): | ||
| """check if torch_npu version >= dev20250919""" | ||
| import re | ||
| import re # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense in performance
|
add fill_(0) in attenion for vllm-project/vllm#26680 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
e7bc4b6 to
f23817d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very hard work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
* fix torchair deepseekv2 modeling when q_lora_rank is None * add fill_(0) for attn output in dummy run * add comments for NPUWorker.device Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
6512d4b to
05599bc
Compare
|
This pr or the updated vllm code maybe introduce some synchornize operations somewhere, which breaks aclgraph in mtp scenario. But I think this pr is more important, thus recommand to merge this firstly after ci passed. And I will fix the above issue later |
…ion tests (#3729) ### What this PR does / why we need it? Enable the unit tests that #3612 skipped. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Unit tests. - vLLM main: vllm-project/vllm@17c540a Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it? [UT] fix ut test for test_utils that #3612 skipped. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: vllm-project/vllm@17c540a - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
What this PR does / why we need it?
This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a
refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a
bunches of fixes due to vllm changes
AscendScheduler__post_init__, caused by [Bugfix]: Clean up chunked prefill logging when using whisper vllm#25075AscendSchedulerinit got an unexpected argblock_size, caused by [bugfix][DCP] fix block_size of hash in DCP prefix caching vllm#26296KVCacheManagerget_num_common_prefix_blocksarg, caused by fix(v1/kv_cache): resolve async KV transfer bug in cascade attention vllm#23485MLAAttentionimport,caused by Separate MLAAttention class from Attention vllm#25103SharedFusedMoEimport, caused by [Model] Apply shared experts overlap optimization to all models with shared experts vllm#26145LazyLoaderimprot, caused by [Chore] Separate outvllm.utils.import_utilsvllm#27022vllm.utils.swap_dict_valuesimprot, caused by [Chore] Separate outvllm.utils.collectionsvllm#26990Backendenum import, caused by [Attention] Move Backend enum into registry vllm#25893CompilationLevelrenaming toCompilationModeissue introduced by [Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level vllm#26355inputs_embeds, caused by [Bugfix] Token type and position embeddings fail to be applied toinputs_embedsvllm#25922get_input_positions_tensortoget_mrope_input_positions, caused by [Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py vllm#24172splitting_opschanges introduced by [torch.compile] Make inductor partition rules respect splitting_ops #25691 vllm#25845Does this PR introduce any user-facing change?
How was this patch tested?
CI passed with existing test.