-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Model] Consolidate pooler implementations #20927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Consolidate pooler implementations #20927
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @DarkLight1337, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request undertakes a significant refactoring of the pooling and classification mechanisms within the model executor. The primary goal is to consolidate and standardize the implementation of pooling layers, especially for sequence classification tasks, by introducing and integrating a unified ClassifierPooler
. This change aims to enhance code consistency, modularity, and maintainability across various models that perform classification.
Highlights
- Consolidated Pooling Logic: The
ClassifierPooler
is now the central component for handling pooling, applying the classifier, and managing activation functions for sequence classification models. This unifies the approach across different model types. - Refactored
Pooler
Class: The basePooler
class has been refactored to extract its activation logic into a dedicatedactivation
method, improving modularity and reusability. - Standardized
as_seq_cls_model
: Models utilizing theas_seq_cls_model
decorator now instantiate and use theClassifierPooler
for their classification head, simplifying their internalpooler
method and removing redundant logic. - Jamba Model Integration: The
JambaForSequenceClassification
model has been updated to leverage the newClassifierPooler
for its reward head, streamlining its implementation and introducing a custom weight loader to ensurefloat32
precision for its score weights.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the pooler implementations to use a consolidated ClassifierPooler
. I've found a critical issue in both adapters.py
and jamba.py
where the new ClassifierPooler
is initialized without a pooler
instance, causing the pooling step to be skipped. I've provided suggestions to fix this. For jamba.py
, this fix also changes the order of operations from pool(classify(h))
to the more standard classify(pool(h))
, which will affect the output as you noted in the PR description.
def _init_pooler(self, vllm_config: "VllmConfig"): | ||
pooler_config = vllm_config.model_config.pooler_config | ||
assert pooler_config is not None | ||
|
||
return ClassifierPooler( | ||
vllm_config.model_config, | ||
self.score, | ||
act_fn=super()._init_pooler(vllm_config).head.activation, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClassifierPooler
is initialized without a pooler
instance. This will cause the pooling step (extract_states
) to be skipped inside ClassifierPooler.forward
, as self.pooler
will be None
. This is a correctness issue as the classifier will be applied on unpooled hidden states, which is likely not the intended behavior.
To fix this, you should create the inner_pooler
first and pass it to the ClassifierPooler
.
def _init_pooler(self, vllm_config: "VllmConfig"): | |
pooler_config = vllm_config.model_config.pooler_config | |
assert pooler_config is not None | |
return ClassifierPooler( | |
vllm_config.model_config, | |
self.score, | |
act_fn=super()._init_pooler(vllm_config).head.activation, | |
) | |
def _init_pooler(self, vllm_config: "VllmConfig"): | |
pooler_config = vllm_config.model_config.pooler_config | |
assert pooler_config is not None | |
inner_pooler = super()._init_pooler(vllm_config) | |
return ClassifierPooler( | |
vllm_config.model_config, | |
self.score, | |
act_fn=inner_pooler.head.activation, | |
) |
vllm/model_executor/models/jamba.py
Outdated
self._pooler = ClassifierPooler( | ||
vllm_config.model_config, | ||
classifier=self.score, | ||
act_fn=inner_pooler.head.activation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClassifierPooler
is initialized without a pooler
instance. This will cause the pooling step to be skipped, which is a change from the previous behavior and likely incorrect. The classifier will be applied on unpooled hidden states.
Additionally, the original implementation for Jamba was pool(classify(hidden_states))
, while the standard approach (and what ClassifierPooler
implements with the suggested fix) is classify(pool(hidden_states))
. This change in logic will affect the model's output, which you've noted in the PR description. Please confirm if this logic change is intended.
To fix the missing pooling step, you should pass the inner_pooler
to the ClassifierPooler
.
self._pooler = ClassifierPooler( | |
vllm_config.model_config, | |
classifier=self.score, | |
act_fn=inner_pooler.head.activation, | |
inner_pooler = Pooler.from_config_with_defaults( | |
pooler_config, | |
pooling_type=PoolingType.LAST, | |
normalize=False, | |
softmax=False, | |
) | |
self._pooler = ClassifierPooler( | |
vllm_config.model_config, | |
classifier=self.score, | |
act_fn=inner_pooler.head.activation, | |
) |
Let me fix some bugs locally first |
I'm going to implement this todo next
Related to #19925 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring!
Had to do another round of refactoring, PTAL again |
return build_output(pooled_data) | ||
|
||
|
||
class StepPooler(BasePooler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really fit in with the other pooling types so I made this its own class
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM!
Nice refactoring! +1 |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
return None | ||
|
||
|
||
def get_classification_activation_function(config: PretrainedConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions aren't used anywhere outside of pooler.py
so I moved them
"""Check if the model uses step pooler.""" | ||
return is_pooling_model(model) and any( | ||
type(module).__name__ == "StepPool" for module in model.modules()) | ||
type(module).__name__ == "StepPooler" for module in model.modules()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks hacky. I'm planning to require models to define pooler
as a BasePooler
instance in the next PR so we can directly inspect model.pooler
to get this information
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
set MTEB_RERANK_TOL = 0.01 I will switch to a more robust test as soon as possible |
We cannot rule out a potential numerical problem with this pr, but it would be very difficult to find it. https://buildkite.com/vllm/ci/builds/24088/steps/canvas?sid=01981164-83cc-49e3-bc15-5b73f41a98ba https://buildkite.com/vllm/ci/builds/23961/steps/canvas?sid=01980c3e-3f93-48c6-9c00-f25ec6b40f6a this pr |
Yeah that is what I mean, trying to find out why the score decreased in this PR |
Maybe there is indeed a potential numerical stability problem, and it does not affect the bert-like model, BAAI/bge-reranker-v2-gemma is relatively stable and can be used for debugging I think the difference is relatively small, we can first set MTEB_RERANK_TOL = 0.01 and then find it gradually . |
There is a detail that may need pay attention. Because V1 does not support Gemma, V0 is used. vllm/tests/models/language/pooling/test_bge_reranker_v2_gemma.py Lines 121 to 124 in d0dc4cf
So is the numerical of v1 more unstable? |
I think I found that issue. On main branch, the data passed to the activation function is in float32, but in this PR it is in the models' dtype. Let's see if using float32 fixes the problem |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
buildkite/ci/pr/language-models-test-extended-pooling passed BAAI/bge-reranker-v2-gemma The reranker test isn't completely useless |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
### What this PR does / why we need it? - Fix broken commit by [#20927](vllm-project/vllm#20927) - Fix broken commit by [#20466](vllm-project/vllm#20466) - TODO: more fully adapt to the upstream reconstruction, let's first make CI happy - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@11dfdf2 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Use
ClassifierPooler
inas_seq_cls_model
andJambaForSequenceClassification
. cc @noooop @maxdebaysercc @yecohn @tomeras91 can you help check whether the output of Jamba-reward remains the same? There are no tests that check the correctness of this model.
Test Plan
Test Result
(Optional) Documentation Update