Skip to content

Conversation

noooop
Copy link
Contributor

@noooop noooop commented Aug 14, 2025

Purpose

Fix another flaky test by increasing tolerance. Related to #22862

FIX #22923

cc @maxdebayser @DarkLight1337

  • mteb_test uses enforce_eager
  • MTEB_EMBED_TOL = 1e-4 -> 0.02
  • Add print("Model:", model_info.name) to make it easier for the regular expression to retrieve the information
import re

file = open("name.log").read()

pattern = re.compile(r'^.*Model: (.+)\n.*VLLM: (.+) (.+)\n.+SentenceTransformers: (.+) (.+)\n.+Difference: (.+)\n', re.M)

match_obj = pattern.findall(file)
for name, dtype, vllm_main_score, st_dtype, st_main_score, diff in match_obj:
    print(name, dtype, vllm_main_score, st_dtype, st_main_score, diff)

Test Plan

mteb_test_embed_models

Test Result

pass

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a change to default pooling models to use eager execution, addressing potential numerical precision issues with torch.compile. The implementation correctly makes the enforce_eager configuration optional and sets its default value based on the model's runner type. My review identified a critical issue where a new assertion could cause a crash if model_config is not set. I have provided a suggestion to fix this issue. Otherwise, the changes are sound.

@noooop noooop force-pushed the pooling_enforce_eager branch from 847ad4a to 88787ba Compare August 14, 2025 05:51
@noooop noooop marked this pull request as draft August 14, 2025 06:03
@noooop noooop force-pushed the pooling_enforce_eager branch from cf40990 to 39f9f90 Compare August 14, 2025 06:04
@noooop noooop changed the title [Model] Pooling models default to using enforce_eager [WIP] Try to fix numerical issues in embedding models Aug 14, 2025
@noooop noooop closed this Aug 14, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop reopened this Aug 15, 2025
@noooop noooop force-pushed the pooling_enforce_eager branch from 39f9f90 to d3c9154 Compare August 15, 2025 02:58
@noooop noooop marked this pull request as ready for review August 15, 2025 03:00
@noooop noooop changed the title [WIP] Try to fix numerical issues in embedding models [CI] Pooling models mteb test uses enforce_eager Aug 15, 2025
@noooop
Copy link
Contributor Author

noooop commented Aug 15, 2025

cc @DarkLight1337

@noooop noooop mentioned this pull request Aug 15, 2025
4 tasks
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's increase the tolerance for now, seems that there are more and more tests that fail the CI

@noooop
Copy link
Contributor Author

noooop commented Aug 15, 2025

Yeah let's increase the tolerance for now, seems that there are more and more tests that fail the CI

First, exclude the effects caused by torch compile, which might be the biggest difference between v0 and v1.

I plan to conduct a long-term numerical precision statistics to see which factors affect numerical precision.

@noooop
Copy link
Contributor Author

noooop commented Aug 15, 2025

@DarkLight1337

Let's observe for a long time.

20250815 baseline:

name dtype vllm_main_score st_dtype st_main_score diff
jinaai/jina-embeddings-v3 torch.bfloat16 0.824464932 torch.float32 0.824413164 0.00005
BAAI/bge-m3 torch.float16 0.787319813 torch.float32 0.787343078 0.00002
intfloat/multilingual-e5-base torch.float16 0.779364278 torch.float32 0.779325955 0.00004
BAAI/bge-base-en torch.float16 0.779340357 torch.float32 0.779336792 0.00000
Alibaba-NLP/gte-multilingual-base torch.float16 0.775063329 torch.float32 0.775074696 0.00001
Qwen/Qwen3-Embedding-0.6B torch.float32 0.771150305 torch.float32 0.771163695 0.00001
thenlper/gte-large torch.float16 0.768054694 torch.float32 0.76807651 0.00002
BAAI/bge-code-v1 torch.float32 0.757248666 torch.float32 0.75724465 0.00000
Alibaba-NLP/gte-modernbert-base torch.float16 0.748158342 torch.float32 0.748193353 0.00004
Alibaba-NLP/gte-base-en-v1.5 torch.float16 0.74427673 torch.float32 0.744314039 0.00004
intfloat/e5-small torch.float16 0.74229612 torch.float32 0.742285423 0.00001
Alibaba-NLP/gte-large-en-v1.5 torch.float16 0.73928701 torch.float32 0.739301913 0.00001
nomic-ai/nomic-embed-text-v1 torch.float16 0.737572204 torch.float32 0.737568559 0.00000
nomic-ai/nomic-embed-text-v2-moe torch.float16 0.715533087 torch.float32 0.715488912 0.00004
Snowflake/snowflake-arctic-embed-xs torch.float16 0.714940761 torch.float32 0.714927797 0.00001
Snowflake/snowflake-arctic-embed-l-v2.0 torch.float16 0.712266816 torch.float32 0.712258299 0.00001
Snowflake/snowflake-arctic-embed-m-v2.0 torch.float16 0.706645738 torch.float32 0.706622444 0.00002
Snowflake/snowflake-arctic-embed-m-long torch.float16 0.681194513 torch.float32 0.681146831 0.00005
Snowflake/snowflake-arctic-embed-m-v1.5 torch.float16 0.649126693 torch.float32 0.649088363 0.00004

@noooop
Copy link
Contributor Author

noooop commented Aug 15, 2025

@DarkLight1337

Can we merge this PR?

@DarkLight1337
Copy link
Member

The test passed so sure

@vllm-bot vllm-bot merged commit 5406ebf into vllm-project:main Aug 15, 2025
16 checks passed
@noooop noooop deleted the pooling_enforce_eager branch August 18, 2025 06:00
juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 18, 2025
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
Gh0u1L5 pushed a commit to Gh0u1L5/vllm that referenced this pull request Aug 21, 2025
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI Failure][NIGHTLY FIRE DRILL]: Language Models (Extended Pooling)
3 participants