[Misc] Update embedding/cross encoder tests to use `mteb` v2 #27329

Samoed · 2025-10-22T09:40:29Z

MTEB v2 was released, and I've updated tests to use follow new API

Purpose

To follow mteb v2 api

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request updates the mteb version to v2 and adjusts the tests accordingly. The changes include updating dependencies in requirements/test.in and requirements/test.txt, modifying the mteb_utils.py file to adapt to the new MTEB API, and removing unnecessary code in test_cross_encoder.py and other test files. The review focuses on ensuring the correctness of the updated code and adherence to best practices.

tests/models/language/pooling_mteb_test/mteb_utils.py

chatgpt-codex-connector

💡 Codex Review

vllm/tests/models/language/pooling_mteb_test/mteb_utils.py

Lines 299 to 306 in 23bd209

    
           def mteb_test_rerank_models( 
        
               hf_runner, 
        
               vllm_runner, 
        
               model_info: RerankModelInfo, 
        
               vllm_extra_kwargs=None, 
        
               hf_model_callback=None, 
        
               vllm_mteb_encoder=VllmMtebEncoder, 
        
               atol=MTEB_RERANK_TOL,

Use cross encoder class for rerank tests

mteb_test_rerank_models still defaults to VllmMtebEncoder, but after the refactor this class only implements the embedding protocol and no longer defines predict. The rerank pathway now expects an object implementing mteb.CrossEncoderProtocol, so calling the tests with the default will raise an AttributeError when mteb.evaluate tries to call predict. The default should be updated to the new VllmMtebCrossEncoder (or another cross encoder) so the rerank tests can execute.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-actions · 2025-10-22T09:46:02Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

noooop · 2025-10-22T09:46:10Z

Congratulations on the release of MTEB v2 !

MTEB testing helps align vllm with the implementation of sentence-transformers, and identifying potential numerical precision issues is becoming increasingly important.

enable CI to help you quickly find failed tests

cc @DarkLight1337 Please help unblock Language Models Test (MTEB)

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

noooop · 2025-10-22T09:59:02Z

https://buildkite.com/vllm/ci/builds/35829/steps/canvas?sid=019a0a13-ef28-4260-87f8-b6f4d685791a

@Isotr0py Today's Mteb CI failure may be related to #27303. Please help fix it.

https://buildkite.com/vllm/ci/builds/35670/steps/canvas?sid=019a04ed-a170-4d3e-bdd2-0fb84975d966 Yesterday was still passed

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

tests/models/language/pooling_mteb_test/mteb_utils.py

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Samoed · 2025-10-23T14:25:02Z

Mteb related tests are now passing. I think 2 stage reranking can be speedup by changing NFCorpus to NanoNFCorpus and selecting less top_k, because now it's running for more than an hour.

tests/models/language/pooling_mteb_test/mteb_utils.py

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-28T14:58:57Z

models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 still fails

the FP32 precision on this machine did not meet expectations.

I think it might have triggered a CI bug. (I think it's very likely caused by the torch.compile cache.)

How about force merging this PR after the 0.11.1 release. I will fix it asap if main still fails.

cc @DarkLight1337 @hmellor

DarkLight1337 · 2025-10-28T15:20:44Z

Can you just disable CUDA graph for that test and add a FIXME?

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-29T00:44:07Z

disable CUDA graph doesn't work, try torch.set_float32_matmul_precision("highest")

I saw the logs below, not sure if there's any impact.

[2025-10-29 08:39:06] INFO config.py:66: Polars version 1.34.0 available.
[2025-10-29 08:39:06] INFO retrieval_metrics.py:20: Setting torch float32 matmul precision to high for a speedup

https://github.com/embeddings-benchmark/mteb/blob/8189108e49bf2dd8e7b0121f72106e7333fcbe6f/mteb/_evaluators/retrieval_metrics.py#L19

(╯‵□′)╯︵┻━┻

cc @Samoed
Please do not set torch.set_float32_matmul_precision

We have some thresholds that are at a very extreme limit, must have the highest precision to pass.

Although there are more strange error messages appearing.

torch.set_float32_matmul_precision("highest") can allow models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] in buildkite/ci/pr/language-models-tests-extra-standard-1 to pass successfully

Signed-off-by: wang.yuqi <noooop@126.com>

Samoed · 2025-10-29T06:09:09Z

Please do not set torch.set_float32_matmul_precision

I will remove it then

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-30T15:12:54Z

Sorry, we can’t merge this PR because the MTEB tests are failing on main. I’ll merge it once the main branch is fixed. @Samoed

PTAL #27724

Samoed · 2025-10-30T15:31:12Z

I updated MTEB just to keep it up to date, so I’m not in a hurry. I can help with anything if you need support.

noooop · 2025-10-31T12:20:28Z

I believe this will pass the tests now, but let’s hold on merging until after vLLM 0.11.1 released.

update mteb version

23bd209

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Samoed requested a review from noooop as a code owner October 22, 2025 09:40

Samoed changed the title ~~update mteb version~~ Update mteb tests to use mteb v2 Oct 22, 2025

mergify bot added the ci/build label Oct 22, 2025

Samoed changed the title ~~Update mteb tests to use mteb v2~~ Update embedding/cross encoder tests to use mteb v2 Oct 22, 2025

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

Samoed changed the title ~~Update embedding/cross encoder tests to use mteb v2~~ [Misc] Update embedding/cross encoder tests to use mteb v2 Oct 22, 2025

chatgpt-codex-connector bot reviewed Oct 22, 2025

View reviewed changes

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025

DarkLight1337 and others added 3 commits October 22, 2025 17:52

Merge branch 'main' into update_mteb

2c6b6c0

fix gemma reranker

edb150c

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

fix comment

dc1c71e

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Isotr0py mentioned this pull request Oct 22, 2025

[Bugfix] Disable FlexAttention direct block mask building for encoder-only models #27344

Merged

5 tasks

Samoed and others added 5 commits October 22, 2025 17:10

fix lang selection

908b906

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

inherit from protocols

3d7a714

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

add similarity functions

603fe31

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Merge branch 'main' into update_mteb

1b994fc

add model name to meta

6a2d026

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Samoed mentioned this pull request Oct 22, 2025

fix: verify languages during filtering embeddings-benchmark/mteb#3472

Merged

noooop mentioned this pull request Oct 23, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput #27378

Merged

5 tasks

Merge branch 'main' into update_mteb

d5ed345

noooop reviewed Oct 23, 2025

View reviewed changes

tests/models/language/pooling_mteb_test/mteb_utils.py Show resolved Hide resolved

fix test

328d980

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Merge branch 'main' into update_mteb

982f148

noooop reviewed Oct 24, 2025

View reviewed changes

tests/models/language/pooling_mteb_test/mteb_utils.py Show resolved Hide resolved

Skip rerank test confirm that all other issues have been resolved

bb62a09

Signed-off-by: wang.yuqi <noooop@126.com>

add torch.__version__ for compute_hash

11f89f9

Signed-off-by: wang.yuqi <noooop@126.com>

noooop force-pushed the update_mteb branch from 243d0ed to 11f89f9 Compare October 29, 2025 00:26

noooop requested review from ProExpertProg, WoosukKwon, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 29, 2025 00:26

noooop added 2 commits October 29, 2025 08:26

Merge branch 'main' into update_mteb

6af8f65

try torch.set_float32_matmul_precision("highest")

6ae1f80

Signed-off-by: wang.yuqi <noooop@126.com>

- torch.__version__ for compute_hash

f55e3f1

Signed-off-by: wang.yuqi <noooop@126.com>

Samoed mentioned this pull request Oct 29, 2025

fix: remove set_float32_matmul_precision embeddings-benchmark/mteb#3509

Merged

noooop mentioned this pull request Oct 29, 2025

[CI Failure]: torch._inductor.exc.InductorError in Nightly build to run all tests #27724

Closed

3 tasks

Samoed and others added 3 commits October 29, 2025 09:53

bump mteb

428225c

Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

rm torch.set_float32_matmul_precision("highest")

e9f11a3

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into update_mteb

2496f7c

Merge branch 'main' into update_mteb

df54280

noooop enabled auto-merge (squash) October 31, 2025 11:46

noooop approved these changes Oct 31, 2025

View reviewed changes

noooop disabled auto-merge October 31, 2025 12:16

	def mteb_test_rerank_models(
	hf_runner,
	vllm_runner,
	model_info: RerankModelInfo,
	vllm_extra_kwargs=None,
	hf_model_callback=None,
	vllm_mteb_encoder=VllmMtebEncoder,
	atol=MTEB_RERANK_TOL,

Uh oh!

[Misc] Update embedding/cross encoder tests to use mteb v2 #27329

Are you sure you want to change the base?

[Misc] Update embedding/cross encoder tests to use mteb v2 #27329

Conversation

Samoed commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

noooop commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Samoed commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Oct 28, 2025

Uh oh!

noooop commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Oct 29, 2025

Uh oh!

noooop commented Oct 30, 2025

Uh oh!

Samoed commented Oct 30, 2025

Uh oh!

noooop commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Misc] Update embedding/cross encoder tests to use `mteb` v2 #27329

[Misc] Update embedding/cross encoder tests to use `mteb` v2 #27329

Samoed commented Oct 22, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 22, 2025 •

edited

Loading

noooop commented Oct 22, 2025 •

edited

Loading

Samoed commented Oct 23, 2025 •

edited

Loading

noooop commented Oct 28, 2025 •

edited

Loading

noooop commented Oct 29, 2025 •

edited

Loading

noooop commented Oct 31, 2025 •

edited

Loading