[Feature][not ready]: Batch-Level Multimodal Embedding Mask Optimization

### 🚀 The feature, motivation and pitch

#  1. Motivation

Currently, merge_multimodal_embeddings scans input_ids individually for each request to find placeholder tokens. This is inefficient because the scheduler already has mm_positions data for all requests. We should pre-compute a batch-level mask (like grammar_bitmask) instead of scanning at runtime.

[The Problem](https://github.com/ywang96/vllm/blob/60f0843ef8fb4b0c4e6788acc042873a0a2ea2a1/vllm/model_executor/models/utils.py#L478C2-L478C32)
1. torch.isin(input_ids, placeholder_token_id) - Scans entire input_ids tensor to find multiple placeholder tokens
2. (input_ids == placeholder_token_id) - Scans entire input_ids tensor to find single placeholder token

#  2. Proposed Changes
  
  Phase 1: Core Function + Test
  - [ ] Add merge_multimodal_embeddings_with_mask() function to utils (/vllm/model_executor/models/utils.py)
  - [ ] Add unit test
 
  Phase 2: Integration
  - [ ] Add mask generation from mm_positions to scheduler
  - [ ] Replace scanning calls with mask version


#23891
#16229


### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][not ready]: Batch-Level Multimodal Embedding Mask Optimization #24456

🚀 The feature, motivation and pitch

1. Motivation

2. Proposed Changes

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature][not ready]: Batch-Level Multimodal Embedding Mask Optimization #24456

Description

🚀 The feature, motivation and pitch

1. Motivation

2. Proposed Changes

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions