-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix] Merge MM embeddings by index instead of token IDs #16229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Merge MM embeddings by index instead of token IDs #16229
Conversation
…ken ID Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
We will not include this in the v0.11 release because of the breaking change. But it should be fine to merge this into main branch since the release branch has been cut already. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
upstream PR: vllm-project/vllm#16229 Fix is still in progress, don't merge yet --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Co-authored-by: Chendi Xue <Chendi.Xue@intel.com>
upstream PR: vllm-project/vllm#16229 Fix is still in progress, don't merge yet --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Co-authored-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai>
…ect#16229) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…ect#16229) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ect#16229) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io>
…ect#16229) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io>
### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>
…ect#16229) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
This PR fixes a mismatch in merging multi-modal embeddings when the model itself generates embedding placeholder tokens such as
<image>. Although this error mainly occurs in V1, it can possibly occur in V0 as well. This PR focuses on the V1 case.For V0 users, you can work around this by setting
top_pso that the model has no chance of generating such tokens.FIX #15677
FIX #15764
FIX #23891
FIX #23954
FIX #24456
Breaking change for model developers
This PR has updated
SupportsMultiModal.get_input_embeddingsto support passingis_multimodalmask and added a default implementation so that there is no need to override it in most cases. OOT/WIP models should either remove their override to use the default implementation, or update their override to acceptis_multimodalanddo_language_embed_multimodalarguments.Text-only model developers should ensure that their models have implemented
get_input_embeddingsto continue using them in vLLM.Breaking change for model runner plugins
In order to continue supporting multimodal models, you should update
_gather_mm_embeddingsmethod to build up and return theis_mm_embedmask, then pass it to the model.