[Bugfix] Merge MM embeddings by index instead of token IDs #16229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

DarkLight1337 merged 60 commits into vllm-project:main from DarkLight1337:rm-merge-mm-embeddings

Sep 27, 2025

+965 −1,138

Member

DarkLight1337 commented Apr 8, 2025 •

edited by github-actions bot

Loading

This PR fixes a mismatch in merging multi-modal embeddings when the model itself generates embedding placeholder tokens such as <image>. Although this error mainly occurs in V1, it can possibly occur in V0 as well. This PR focuses on the V1 case.

For V0 users, you can work around this by setting top_p so that the model has no chance of generating such tokens.

FIX #15677
FIX #15764
FIX #23891
FIX #23954
FIX #24456

Breaking change for model developers

This PR has updated SupportsMultiModal.get_input_embeddings to support passing is_multimodal mask and added a default implementation so that there is no need to override it in most cases. OOT/WIP models should either remove their override to use the default implementation, or update their override to accept is_multimodal and do_language_embed_multimodal arguments.

Text-only model developers should ensure that their models have implemented get_input_embeddings to continue using them in vLLM.

Breaking change for model runner plugins

In order to continue supporting multimodal models, you should update _gather_mm_embeddings method to build up and return the is_mm_embed mask, then pass it to the model.


          [Bugfix] Merge multimodal embeddings by is_embed mask instead of to…

dfebf51

…ken ID

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

github-actions bot commented Apr 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify bot added v1 tpu labels

DarkLight1337 changed the title ~~[Bugfix] Merge multimodal embeddings by mask instead of token ID~~ [Bugfix][V1] Merge multimodal embeddings by mask instead of token ID


          Rename

437dacd

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 changed the title ~~[Bugfix][V1] Merge multimodal embeddings by mask instead of token ID~~ [Bugfix][V1] Merge multimodal embeddings by index instead of token ID

DarkLight1337 changed the title ~~[Bugfix][V1] Merge multimodal embeddings by index instead of token ID~~ [Bugfix][V1] Merge multimodal embeddings by index instead of matching token ID

DarkLight1337 changed the title ~~[Bugfix][V1] Merge multimodal embeddings by index instead of matching token ID~~ [Bugfix][V1] Merge multimodal embeddings by index instead of matching tokens

DarkLight1337 added 2 commits

April 9, 2025 11:13


          Merge branch 'main' into rm-merge-mm-embeddings

bbe7096


          Use vllm-project#16007

57e9f03

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added this to Multi-modality Core

DarkLight1337 moved this to In Progress in Multi-modality Core

FerryHuang mentioned this pull request

[Bug]: qwen2-vl 7b, on vllm 0.8.1 & 0.8.2, sometimes (not deterministically but depends on data) I got: ValueError: Attempted to assign 702 = 702 multimodal tokens to 703 placeholders #15764

Closed

1 task

DarkLight1337 added 6 commits

August 27, 2025 13:32


          Merge branch 'main' into rm-merge-mm-embeddings

d5c9555

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

e08deaa

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Merge branch 'main' into rm-merge-mm-embeddings

302b2c5


          Update

6a1307f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

3a4740a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Draft

68c54d8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the speculative-decoding label

DarkLight1337 added 2 commits

August 28, 2025 10:41


          Fix device

6ddc91e

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Persistent buffer

28cc8cb

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 commented

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved


          Avoid unnecessary initialization

c335908

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 commented

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

DarkLight1337 added 2 commits

August 28, 2025 14:15


          Fix reset

cbb70ea

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Update

76f2925

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 commented

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved


          Simplify

b6e8775

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Merge branch 'main' into rm-merge-mm-embeddings

7769ec1

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot removed the needs-rebase label

DarkLight1337 added 4 commits

September 24, 2025 08:53


          Reduce diff

aa67033

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Merge branch 'main' into rm-merge-mm-embeddings

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Simplify

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>


          Fix doc

2ac91b6

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot commented Sep 27, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the needs-rebase label


          Merge branch 'main' into rm-merge-mm-embeddings

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot removed the needs-rebase label

DarkLight1337 enabled auto-merge (squash)

September 27, 2025 06:56

Member Author

DarkLight1337 commented Sep 27, 2025 •

edited

Loading

We will not include this in the v0.11 release because of the breaking change. But it should be fine to merge this into main branch since the release branch has been cut already.

DarkLight1337 merged commit 27d7638 into vllm-project:main

52 checks passed

DarkLight1337 deleted the rm-merge-mm-embeddings branch

September 27, 2025 08:15

DarkLight1337 added a commit to wangxiongts/vllm-dev that referenced this pull request


          Update w.r.t. vllm-project#16229

f0d057a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

This was referenced Sep 27, 2025

[Compatibility]: Multi-modal support post-v0.11.0 vllm-project/vllm-ascend#3217

Open

[Compatibility]: Multi-modal support post-v0.11.0 vllm-project/vllm-gaudi#283

Closed

[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl #22872

Merged

xuechendi mentioned this pull request

Fix after #16229, mm vllm-project/vllm-gaudi#286

Merged

xuechendi added a commit to vllm-project/vllm-gaudi that referenced this pull request


          Fix after #16229, mm (#286)

18ead2d

upstream PR: vllm-project/vllm#16229
Fix is still in progress, don't merge yet

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Co-authored-by: Chendi Xue <Chendi.Xue@intel.com>

iboiko-habana pushed a commit to iboiko-habana/vllm-gaudi that referenced this pull request


          Fix after #16229, mm (vllm-project#286)

a0f865f

upstream PR: vllm-project/vllm#16229
Fix is still in progress, don't merge yet

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Co-authored-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (vllm-proj…

1031f21

…ect#16229)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

yewentao256 pushed a commit that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (#16229)

0b8166a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (vllm-proj…

f352ecd

…ect#16229)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>

JJJYmmm mentioned this pull request

Qwen3VL performance on grounding task is extremely poor QwenLM/Qwen3-VL#1576

Closed

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (vllm-proj…

32f240d

…ect#16229)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

MengqingCao mentioned this pull request

[1/N][Refactor] Refactor code to adapt with vllm main vllm-project/vllm-ascend#3612

Merged

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (vllm-proj…

a3c6c27

…ect#16229)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request


          [1/N][Refactor] Refactor code to adapt with vllm main (#3612)

cea0755

### What this PR does / why we need it?
This is the step 1 of refactoring code to adapt with vllm main, and this
pr aligned with
vllm-project/vllm@17c540a

1. refactor deepseek to the latest code arch as of
vllm-project/vllm@17c540a
 
2. bunches of fixes due to vllm changes
- Fix `AscendScheduler` `__post_init__`, caused by
vllm-project/vllm#25075
- Fix `AscendScheduler` init got an unexpected arg `block_size`, caused
by vllm-project/vllm#26296
- Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by
vllm-project/vllm#23485
- Fix `MLAAttention` import,caused by
vllm-project/vllm#25103
- Fix `SharedFusedMoE` import, caused by
vllm-project/vllm#26145
- Fix `LazyLoader` improt, caused by
vllm-project/vllm#27022
- Fix `vllm.utils.swap_dict_values` improt, caused by
vllm-project/vllm#26990
- Fix `Backend` enum import, caused by
vllm-project/vllm#25893
- Fix `CompilationLevel` renaming to `CompilationMode` issue introduced
by vllm-project/vllm#26355
- Fix fused_moe ops, caused by
vllm-project/vllm#24097
- Fix bert model because of `inputs_embeds`, caused by
vllm-project/vllm#25922
- Fix MRope because of `get_input_positions_tensor` to
`get_mrope_input_positions`, caused by
vllm-project/vllm#24172
- Fix `splitting_ops` changes introduced by
vllm-project/vllm#25845
- Fix multi-modality changes introduced by
vllm-project/vllm#16229
- Fix lora bias dropping issue introduced by
vllm-project/vllm#25807
- Fix structured ouput break introduced by
vllm-project/vllm#26737

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request


          [Bugfix] Merge MM embeddings by index instead of token IDs (vllm-proj…

bfc2b5e

…ect#16229)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

NickLucche NickLucche approved these changes

ywang96 ywang96 left review comments

WoosukKwon Awaiting requested review from WoosukKwon WoosukKwon is a code owner

robertgshaw2-redhat Awaiting requested review from robertgshaw2-redhat robertgshaw2-redhat is a code owner

njhill Awaiting requested review from njhill

comaniac Awaiting requested review from comaniac

alexm-redhat Awaiting requested review from alexm-redhat

benchislett Awaiting requested review from benchislett benchislett is a code owner

luccafong Awaiting requested review from luccafong luccafong is a code owner

patrickvonplaten Awaiting requested review from patrickvonplaten patrickvonplaten is a code owner

sighingnow Awaiting requested review from sighingnow sighingnow is a code owner

hmellor Awaiting requested review from hmellor hmellor is a code owner

simon-mo Awaiting requested review from simon-mo simon-mo is a code owner

youkaichao Awaiting requested review from youkaichao youkaichao is a code owner

mgoin Awaiting requested review from mgoin mgoin is a code owner

tlrmchlsmth Awaiting requested review from tlrmchlsmth tlrmchlsmth is a code owner

houseroad Awaiting requested review from houseroad houseroad is a code owner

yewentao256 Awaiting requested review from yewentao256 yewentao256 is a code owner

ProExpertProg Awaiting requested review from ProExpertProg ProExpertProg is a code owner

Labels

deepseek documentation llama qwen ready speculative-decoding tpu v1