-
Notifications
You must be signed in to change notification settings - Fork 297
[V1][ModelRunner] Support pooling model for v1 engine #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (75.81%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1359 +/- ##
==========================================
+ Coverage 27.39% 30.89% +3.49%
==========================================
Files 56 59 +3
Lines 6191 6756 +565
==========================================
+ Hits 1696 2087 +391
- Misses 4495 4669 +174
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@Yikun @wangxiyuan I think this should be ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the ut for npu_input_batch as well
@@ -80,6 +80,14 @@ | |||
from vllm_ascend.worker.eagle_proposer_v1 import EagleProposer | |||
from vllm_ascend.worker.mtp_proposer_v1 import MtpProposer | |||
|
|||
if vllm_version_is("0.9.1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? vllm_ascend.worker.npu_input_batch
can't support 0.9.1 vllm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_batch
in the vllm v0.9.1 has been verified and will not make any broken changes, if we use vllm_ascend customized interface, then we also need some version compatibility for vllm v0.9.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such as from vllm.v1.pool.metadata import PoolingMetadata
in the input_batch
, it is needed to make version compatibility
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
ready for review for long time. I'll merge this first |
What this PR does / why we need it?
Change as little existing code as possible to add v1 pooling task's support, notice that i move down the
vllm.v1.worker.gpu_input_batch
to vllm-ascend, Considering the frequent changes in upstream interfaces, in order to decouple, so i move it hereDoes this PR introduce any user-facing change?
How was this patch tested?
CI passed with new added/existing test, and I have a simple test was first conducted locally which is adapted from https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B, just like bellow:
and the result looks good: