-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Model][4/N] Automatic conversion of CrossEncoding model #19675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @noooop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request refactors the automatic model conversion mechanism to support both classification and scoring tasks under a unified as_seq_cls_model
adapter. It updates the model loading utility to recognize the score
task and applies the generic adapter. The adapter itself is refactored to handle pooling and scoring more flexibly, and the Qwen3 sequence classification model is updated to utilize this new generic adapter.
Highlights
- Score Task Support: The automatic model conversion logic in the model loader (
vllm/model_executor/model_loader/utils.py
) is updated to use theas_seq_cls_model
adapter for models specified with thescore
task, in addition to the existingclassify
task. - Adapter Refactoring: The
as_seq_cls_model
adapter invllm/model_executor/models/adapters.py
is refactored. The pooling and scoring logic is moved into a dedicatedpooler
method, allowing for more flexible handling of different pooling types within the adapter. It also adds a check to squeeze the output dimension for thescore
task. - Qwen3 Reranker Integration: The specific
Qwen3ForSequenceClassification
implementation is updated to inherit from the new genericas_seq_cls_model
adapter, simplifying its structure and leveraging the shared adapter logic. Specific Qwen3 reranker configuration verification is moved into a newconfig_verify
method.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request focuses on enabling automatic conversion of score models by adapting the sequence classification model functionality. Key changes include:
- Renaming
as_classification_model
toas_seq_cls_model
and updating its functionality to support both "classify" and "score" tasks. This is consistently applied across documentation, tests, and model loading utilities. - Refactoring
Qwen3ForSequenceClassification
to leverage the newas_seq_cls_model
adapter. This promotes code reuse and centralizes the classification/scoring logic. - Introducing a
config_verify
method in the adapter pattern, allowing model-specific configurations, which is well-utilized byQwen3ForSequenceClassification
for its reranker variant. - Ensuring that for "score" tasks, the model expects
num_labels == 1
and the output is appropriately processed (squeezed).
The changes appear robust and improve the model adaptation framework. One area for potential clarification is the behavior of PoolingType.ALL
within the as_seq_cls_model
adapter, as noted in the specific comment.
Please also consider filling out the checklist in the PR description (Purpose, Test Plan, Test Result) for completeness.
6dc55ba
to
00d377b
Compare
aa22cad
to
71b1df4
Compare
This pull request has merge conflicts that must be resolved before it can be |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
_ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model It seems I need to spend some more time fixing it. |
Signed-off-by: wang.yuqi <noooop@126.com>
I find it difficult to fully fix is_cross_encoder, and I am not familiar with this part of code. So I will use a temporary solution to fix it and document in Known issues.
how did this issues occur
In vllm/model_executor/models/registry.py, it is routed to GemmaForCausalLM.
But _ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model so it derived that GemmaForSequenceClassification is_cross_encoder_model == False, which leads to a wrong calculation method
In vllm/model_executor/models/registry.py, it is routed to Qwen3ForSequenceClassification.
so it derived that GemmaForSequenceClassification is_cross_encoder_model == True
Add the following code in vllm/model_executor/models/registry.py
This is clearly too tedious. |
Signed-off-by: wang.yuqi <noooop@126.com>
@@ -176,7 +178,8 @@ def as_classification_model(cls: _T) -> _T: | |||
default_softmax=True, | |||
) | |||
|
|||
class ModelForClassification(ModelForPooling): | |||
class ModelForSequenceClassification(ModelForPooling, | |||
SupportsCrossEncoding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't is_cross_encoder_model
return true for this model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the code block was not run in ModelRegistry.is_cross_encoder_model
model_cls, arch = ModelRegistry.resolve_model_cls(architectures)
if model_config.task == "embed":
model_cls = as_embedding_model(model_cls)
elif model_config.task in ["classify", "score"]:
model_cls = as_seq_cls_model(model_cls)
elif model_config.task == "reward":
model_cls = as_reward_model(model_cls)
_ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix it in two ways, I'm not sure which one you prefer.
- let _ModelRegistry.is_cross_encoder_model consider as_seq_cls_model, which requires passing a task parameter and modifying many interfaces.there might also be duplicate code.
- Automatically add for each ForCausalLM: XXXForSequenceClassification = as_seq_cls_model(XXXForCausalLM) and add "XXXForSequenceClassification": ("xxx", "XXXForSequenceClassification"). Although I haven't figured out how to do it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't quite get it. In this code the model is converted into ModelForSequenceClassification
via as_seq_cls_model
when --task score
, which allows the model to be detected as is_cross_encoder_model
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@property
def is_cross_encoder(self) -> bool:
return self.registry.is_cross_encoder_model(self.architectures) <- here
def is_cross_encoder_model(
self,
architectures: Union[str, list[str]],
) -> bool:
model_cls, _ = self.inspect_model_cls(architectures) <- here
return model_cls.supports_cross_encoding
@dataclass(frozen=True)
class _LazyRegisteredModel(_BaseRegisteredModel):
"""
Represents a model that has not been imported in the main process.
"""
module_name: str
class_name: str
# Performed in another process to avoid initializing CUDA
def inspect_model_cls(self) -> _ModelInfo:
return _run_in_subprocess(
lambda: _ModelInfo.from_model_cls(self.load_model_cls())) <- here
def load_model_cls(self) -> type[nn.Module]:
mod = importlib.import_module(self.module_name)
return getattr(mod, self.class_name)
@staticmethod
def from_model_cls(model: type[nn.Module]) -> "_ModelInfo":
return _ModelInfo(
architecture=model.__name__,
is_text_generation_model=is_text_generation_model(model),
is_pooling_model=True, # Can convert any model into a pooling model
supports_cross_encoding=supports_cross_encoding(model), <- here # this model expected as_seq_cls_model(GemmaForCausalLM), but actually is GemmaForCausalLM,
supports_multimodal=supports_multimodal(model),
supports_pp=supports_pp(model),
has_inner_state=has_inner_state(model),
is_attention_free=is_attention_free(model),
is_hybrid=is_hybrid(model),
supports_transcription=supports_transcription(model),
supports_v0_only=supports_v0_only(model),
has_noops=has_noops(model),
)
not using: model_cls = as_seq_cls_model(model_cls)
_ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When computing _resolve_task, it calls "if self.registry.is_cross_encoder_model(architectures)", so a circular reference might be formed.
def _get_preferred_task(
self,
architectures: list[str],
supported_tasks: set[_ResolvedTask],
) -> Optional[_ResolvedTask]:
model_id = self.model
if get_pooling_config(model_id, self.revision):
return "embed"
if self.registry.is_cross_encoder_model(architectures): <- here
return "score"
if self.registry.is_transcription_model(architectures):
return "transcription"
suffix_to_preferred_task: list[tuple[str, _ResolvedTask]] = [
# Other models follow this pattern
("ForCausalLM", "generate"),
("ForConditionalGeneration", "generate"),
("ForSequenceClassification", "classify"),
("ChatModel", "generate"),
("LMHeadModel", "generate"),
("EmbeddingModel", "embed"),
("RewardModel", "reward"),
]
_, arch = self.registry.inspect_model_cls(architectures)
for suffix, pref_task in suffix_to_preferred_task:
if arch.endswith(suffix) and pref_task in supported_tasks:
return pref_task
return None
so _resolve_task and _get_preferred_task, as well as inspect_model_cls, need to be refactored
┑( ̄Д  ̄)┍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I solve this problem in another PR as it is very complex and exceeds the scope of this PR?
I think redirecting *ForSequenceClassification to *ForCausalLM makes things complicated.
e.g.
"GemmaForSequenceClassification": ("gemma", "GemmaForCausalLM"),
At this time, *ForCausalLM might be used for generate, classify, embed, score. Among them, embed and score are difficult to distinguish,need to pass the task parameter to distinguish. If parsing fails, incorrect calculation methods will be used and wrong results will be obtained.
should use the code below
"GemmaForSequenceClassification": ("gemma", "GemmaForSequenceClassification"),
Then
*ForSequenceClassification prefers the classify & score task.
Even I think we should not distinguish between the classify & score tasks.
If there is a *ForSequenceClassification model, we should allow users to use both the classify API and the score API.
Their results should be the same, with the difference being whether users concatenate queries and documents or vllm does the concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from vllm import LLM
model = LLM(model="BAAI/bge-reranker-base", task="score")
text_1 = "ping"
text_2 = "pong"
outputs = model.score(text_1, text_2)
print(outputs)
# [ScoringRequestOutput(request_id='0', outputs=ScoringOutput(score=0.77197265625), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]
from vllm import LLM
model = LLM(model="BAAI/bge-reranker-base", task="classify")
text_1 = "ping"
text_2 = "pong"
# after changing the output dimensions slightly
outputs = model.classify([f'{text_1}</s></s>{text_2}'])
print(outputs)
# [ClassificationRequestOutput(request_id='0', outputs=ClassificationOutput(num_classes=1), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help if we have a separate model class for --task score
? Then if the model is *ForSequenceClassification
, we first default to --task classify
. So the user should explicitly set --task score
in order to use it as a cross encoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by adding
GemmaForSequenceClassification = as_seq_cls_model(GemmaForCausalLM)
Qwen2ForSequenceClassification = as_seq_cls_model(Qwen2ForCausalLM)
"GemmaForSequenceClassification": ("gemma", "GemmaForSequenceClassification"),
"Qwen2ForSequenceClassification": ("qwen2", "Qwen2ForSequenceClassification"),
Now there are no test failures and the code is not too dirty, let's merge this pr first.
Let us discuss and solve the issue completely in the next pr.
Signed-off-by: wang.yuqi <noooop@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this to work!
Now I have to deal with a big issue that will be handled in the next pr. now all ForSequenceClassification models is_cross_encoder_model == True
the test below will fail:
AssertionError: assert 'score' == 'classify' because the impact is significant I will first assume that this pr has been merged and prepare the next pr. Once the solution of the next pr is approved, then merge this pr. |
TL;DR
Hope that after merging this pr, vllm can support more llms using the relevance generation method as classifiers and rerankers.
Usage
converting2seq_cls_models.py:
PTAL #19260
Caution
"Yes" and "yes" are two different tokens
requests demo + formating query & document:
expected output
If someone wants to implement an offline conversion from ForCausalLM to ForSequenceClassification support new methods or new models, please refer to
https://github.com/noooop/snippet/tree/main/converting2SequenceClassification
(I don't know where to place this code in vllm.)
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Follow-up #11469, Further improve #10674
Test Plan
Test Result
(Optional) Documentation Update
Known issues
vllm/vllm/model_executor/models/registry.py
Lines 160 to 161 in 6bc7b57
In vllm, the score task only supports num_labels == 1, while models with num_labels == 1 in sentence-transformers use Sigmoid by default.
https://github.com/UKPLab/sentence-transformers/blob/910ed144dfc0a08f31517b0d01580302015fa408/sentence_transformers/cross_encoder/CrossEncoder.py#L485-L487
Perhaps we should update the documentation to set default_softmax=True when the task is score, consistent with the implementation in sentence-transformers. And we should pin the sentence-transformers version to >= 4.1.0.
change verify_and_update_config into a class method in the future and call it when initializing model_config. fix_by [Model][1/N] Automatic conversion of CrossEncoding model #20012
Template aware prompt truncation to avoid cutting off important instructions.
Alibaba-NLP/gte-Qwen2-1.5B-instruct & Alibaba-NLP/gte-modernbert-base
NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for FlashAttentionImpl
Fix #19673
Fix #20051