Skip to content

Conversation

@Samoed
Copy link

@Samoed Samoed commented Oct 22, 2025

MTEB v2 was released, and I've updated tests to use follow new API

Purpose

To follow mteb v2 api

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
@Samoed Samoed requested a review from noooop as a code owner October 22, 2025 09:40
@Samoed Samoed changed the title update mteb version Update mteb tests to use mteb v2 Oct 22, 2025
@mergify mergify bot added the ci/build label Oct 22, 2025
@Samoed Samoed changed the title Update mteb tests to use mteb v2 Update embedding/cross encoder tests to use mteb v2 Oct 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the mteb version to v2 and adjusts the tests accordingly. The changes include updating dependencies in requirements/test.in and requirements/test.txt, modifying the mteb_utils.py file to adapt to the new MTEB API, and removing unnecessary code in test_cross_encoder.py and other test files. The review focuses on ensuring the correctness of the updated code and adherence to best practices.

@Samoed Samoed changed the title Update embedding/cross encoder tests to use mteb v2 [Misc] Update embedding/cross encoder tests to use mteb v2 Oct 22, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

def mteb_test_rerank_models(
hf_runner,
vllm_runner,
model_info: RerankModelInfo,
vllm_extra_kwargs=None,
hf_model_callback=None,
vllm_mteb_encoder=VllmMtebEncoder,
atol=MTEB_RERANK_TOL,

P1 Badge Use cross encoder class for rerank tests

mteb_test_rerank_models still defaults to VllmMtebEncoder, but after the refactor this class only implements the embedding protocol and no longer defines predict. The rerank pathway now expects an object implementing mteb.CrossEncoderProtocol, so calling the tests with the default will raise an AttributeError when mteb.evaluate tries to call predict. The default should be updated to the new VllmMtebCrossEncoder (or another cross encoder) so the rerank tests can execute.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@noooop
Copy link
Collaborator

noooop commented Oct 22, 2025

Congratulations on the release of MTEB v2 !

MTEB testing helps align vllm with the implementation of sentence-transformers, and identifying potential numerical precision issues is becoming increasingly important.


enable CI to help you quickly find failed tests

cc @DarkLight1337 Please help unblock Language Models Test (MTEB)

DarkLight1337 and others added 3 commits October 22, 2025 17:52
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
@noooop
Copy link
Collaborator

noooop commented Oct 22, 2025

https://buildkite.com/vllm/ci/builds/35829/steps/canvas?sid=019a0a13-ef28-4260-87f8-b6f4d685791a

@Isotr0py Today's Mteb CI failure may be related to #27303. Please help fix it.

https://buildkite.com/vllm/ci/builds/35670/steps/canvas?sid=019a04ed-a170-4d3e-bdd2-0fb84975d966 Yesterday was still passed

Samoed and others added 5 commits October 22, 2025 17:10
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
@Samoed
Copy link
Author

Samoed commented Oct 23, 2025

Mteb related tests are now passing. I think 2 stage reranking can be speedup by changing NFCorpus to NanoNFCorpus and selecting less top_k, because now it's running for more than an hour.

@noooop
Copy link
Collaborator

noooop commented Oct 28, 2025

models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 still fails

the FP32 precision on this machine did not meet expectations.

I think it might have triggered a CI bug. (I think it's very likely caused by the torch.compile cache.)

How about force merging this PR after the 0.11.1 release. I will fix it asap if main still fails.

cc @DarkLight1337 @hmellor

@DarkLight1337
Copy link
Member

Can you just disable CUDA graph for that test and add a FIXME?

Signed-off-by: wang.yuqi <noooop@126.com>
@noooop
Copy link
Collaborator

noooop commented Oct 29, 2025

disable CUDA graph doesn't work, try torch.set_float32_matmul_precision("highest")

I saw the logs below, not sure if there's any impact.

[2025-10-29 08:39:06] INFO config.py:66: Polars version 1.34.0 available.
[2025-10-29 08:39:06] INFO retrieval_metrics.py:20: Setting torch float32 matmul precision to high for a speedup

https://github.com/embeddings-benchmark/mteb/blob/8189108e49bf2dd8e7b0121f72106e7333fcbe6f/mteb/_evaluators/retrieval_metrics.py#L19

(╯‵□′)╯︵┻━┻

cc @Samoed
Please do not set torch.set_float32_matmul_precision

We have some thresholds that are at a very extreme limit, must have the highest precision to pass.


Although there are more strange error messages appearing.

torch.set_float32_matmul_precision("highest") can allow models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 to pass successfully

Signed-off-by: wang.yuqi <noooop@126.com>
@Samoed
Copy link
Author

Samoed commented Oct 29, 2025

Please do not set torch.set_float32_matmul_precision

I will remove it then

Samoed and others added 3 commits October 29, 2025 09:53
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop
Copy link
Collaborator

noooop commented Oct 30, 2025

Sorry, we can’t merge this PR because the MTEB tests are failing on main. I’ll merge it once the main branch is fixed. @Samoed

PTAL #27724

@Samoed
Copy link
Author

Samoed commented Oct 30, 2025

I updated MTEB just to keep it up to date, so I’m not in a hurry. I can help with anything if you need support.

@noooop noooop enabled auto-merge (squash) October 31, 2025 11:46
@noooop noooop disabled auto-merge October 31, 2025 12:16
@noooop
Copy link
Collaborator

noooop commented Oct 31, 2025

I believe this will pass the tests now, but let’s hold on merging until after vLLM 0.11.1 released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants