-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Model] Systematic support for fp32 head, pooling models part #23810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wang.yuqi <noooop@126.com>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely an improvement, thanks. I've left suggestion regarding the ModelConfig changes.
189b186
to
05935e0
Compare
Signed-off-by: wang.yuqi <noooop@126.com>
Ready for review |
f0a786b
to
3cb28d4
Compare
Signed-off-by: wang.yuqi <noooop@126.com>
929c535
to
6e53e93
Compare
This pull request has merge conflicts that must be resolved before it can be |
Please take a look at this thread |
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
…roject#23810) Signed-off-by: wang.yuqi <noooop@126.com>
…roject#23810) Signed-off-by: wang.yuqi <noooop@126.com>
…roject#23810) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: rogeryoungh <rogeryoungh@foxmail.com>
…roject#23810) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: bruceszchen <bruceszchen@tencent.com>
…roject#23810) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Purpose
"head" refers to the last Linear layer(s) of an LLM, such as the lm_head in a generation model, or the score or classifier in a classification model.
An increasing amount of evidence suggests that using an fp32 head can improve numerical precision.
[Feature]: Support casting lm_head to FP32 to get old logprobs in RLHF #19925
[Model] Consolidate pooler implementations #20927 (comment)
vllm/vllm/model_executor/models/jamba.py
Lines 607 to 617 in 11a7faf
Let's systematic support for fp32 head
Add VLLM_USING_FP32_HEAD, 1 for enable, 0 for disable, "" for defaultThe generation model defaults to not using fp32 head, Because the lm_head shape is [hidden_size, vocab_size]; set VLLM_USING_FP32_HEAD to enable it.The generation support will be implemented in the next PR.
cc @DarkLight1337 @maxdebayser
Test Plan
keep ci green.
MTEB test
Test Result
MTEB test, The higher the score, the better. Difference: st_main_score - vllm_main_score
A negative diff indicates that the vllm default dtype mteb test performs better than SentenceTransformers with torch.float32.
(Is it not entirely impossible, right?
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.This mteb test can distinguish whether fp32 head is used, give me an emoji if you think it's cool.
↓↓↓↓↓↓↓↓↓↓