-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Misc] Update embedding/cross encoder tests to use mteb v2
#27329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
mteb v2mteb v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the mteb version to v2 and adjusts the tests accordingly. The changes include updating dependencies in requirements/test.in and requirements/test.txt, modifying the mteb_utils.py file to adapt to the new MTEB API, and removing unnecessary code in test_cross_encoder.py and other test files. The review focuses on ensuring the correctness of the updated code and adherence to best practices.
mteb v2mteb v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
vllm/tests/models/language/pooling_mteb_test/mteb_utils.py
Lines 299 to 306 in 23bd209
| def mteb_test_rerank_models( | |
| hf_runner, | |
| vllm_runner, | |
| model_info: RerankModelInfo, | |
| vllm_extra_kwargs=None, | |
| hf_model_callback=None, | |
| vllm_mteb_encoder=VllmMtebEncoder, | |
| atol=MTEB_RERANK_TOL, |
mteb_test_rerank_models still defaults to VllmMtebEncoder, but after the refactor this class only implements the embedding protocol and no longer defines predict. The rerank pathway now expects an object implementing mteb.CrossEncoderProtocol, so calling the tests with the default will raise an AttributeError when mteb.evaluate tries to call predict. The default should be updated to the new VllmMtebCrossEncoder (or another cross encoder) so the rerank tests can execute.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Congratulations on the release of MTEB v2 ! MTEB testing helps align vllm with the implementation of sentence-transformers, and identifying potential numerical precision issues is becoming increasingly important. enable CI to help you quickly find failed tests cc @DarkLight1337 Please help unblock Language Models Test (MTEB) |
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
|
https://buildkite.com/vllm/ci/builds/35829/steps/canvas?sid=019a0a13-ef28-4260-87f8-b6f4d685791a @Isotr0py Today's Mteb CI failure may be related to #27303. Please help fix it. https://buildkite.com/vllm/ci/builds/35670/steps/canvas?sid=019a04ed-a170-4d3e-bdd2-0fb84975d966 Yesterday was still passed |
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
|
Mteb related tests are now passing. I think 2 stage reranking can be speedup by changing |
Signed-off-by: wang.yuqi <noooop@126.com>
|
models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 still fails the FP32 precision on this machine did not meet expectations. I think it might have triggered a CI bug. (I think it's very likely caused by the torch.compile cache.) How about force merging this PR after the 0.11.1 release. I will fix it asap if main still fails. |
|
Can you just disable CUDA graph for that test and add a |
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
disable CUDA graph doesn't work, try torch.set_float32_matmul_precision("highest") I saw the logs below, not sure if there's any impact. (╯‵□′)╯︵┻━┻ cc @Samoed We have some thresholds that are at a very extreme limit, must have the highest precision to pass. Although there are more strange error messages appearing. torch.set_float32_matmul_precision("highest") can allow models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 to pass successfully |
Signed-off-by: wang.yuqi <noooop@126.com>
I will remove it then |
Signed-off-by: wang.yuqi <noooop@126.com>
|
I updated MTEB just to keep it up to date, so I’m not in a hurry. I can help with anything if you need support. |
|
I believe this will pass the tests now, but let’s hold on merging until after vLLM 0.11.1 released. |
MTEB v2 was released, and I've updated tests to use follow new API
Purpose
To follow mteb v2 api
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.