-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
Documentation preview: https://vllm--26414.org.readthedocs.build/en/26414/ |
Signed-off-by: wang.yuqi <noooop@126.com>
|
examples/online_serving/pooling/openai_embedding_embed_dtype_client.py Do you ok with this api? Yes, this PR can even use fp8. The small-scale test results are quite good. A more detailed test will be provided tomorrow. |
|
@noooop Yes There are also optional enhancements: binary protocols, such as those in Postgres, always expect big-endian binary numbers; this is generally the de facto network standard for almost all binary protocols, but models typically operate in little-endian format; byte order conversion is always necessary. Adding the endian parameter also becomes useful. |
|
cc @DarkLight1337 @maxdebayser Ready for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I've left a few comments but this looks good to me.
|
Is there anything else that needs to be modified in this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this LGTM now thanks
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
|
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Improve all pooling task
These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?
Purpose
FIX #26248
mteb test PTAL #17175
https://github.com/noooop/snippet/blob/main/benchmarks/test_mteb/test_embed_dtype.py
float32 ≈ float16 > bfloat16 > fp8_e4m3 >> fp8_e5m2
Even with fp8_e5m2, the gap is smaller than imagined.
Test Plan
tests/entrypoints/pooling/openai/test_embedding.py
tests/entrypoints/pooling/openai/test_pooling.py
Test Result
pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.