[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. #20105

huachenheli · 2025-06-26T00:17:49Z

Purpose

New model additions currently suffer from two pain points:

Adding new multimodal models or modalities requires changing hard-coded _placeholder_str function in https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/chat_utils.py#L507. This is now configurable via --mm-placeholder-str-override so future additions can be contained within model_executor/models/ directory without modifying the vllm framework code.
Video input processing currently uses a hard-coded 32 frame config. We want to allow flexible & extensible media io policies in the future.

Example flag formats:
--mm-placeholder-str-override '{"video":"<|video_placeholder|>", "image": "<|image_placeholder|>"}'
--media-io-kwargs '{"video": {"num_frames": 40, "fps": 2.0, "foo": "bar"}, "image": {"foo":"bar"} }'

Test Plan

Unit tests:
pytest tests/engine/test_arg_utils.py
pytest tests/multimodal/test_video.py
pytest tests/multimodal/test_utils.py
vllm serve test. Printed & verified args are passed as expected.

vllm serve Qwen/Qwen2.5-VL-7B-Instruct --port 8000 --host 0.0.0.0 --dtype bfloat16 --limit-mm-per-prompt image=5,video=5 --mm-placeholder-str-override '{"video":"<|video_placeholder|>"}' --media-io-kwargs '{"video": {"num_frames": 40, "fps": 2.0, "foo": "bar"}, "image": {"foo":"bar"} }'

…strings. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

github-actions · 2025-06-26T00:17:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @huachenheli, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility and extensibility of multimodal model support by introducing configurable placeholder strings and dynamic video sampling policies. These changes empower users to easily integrate new multimodal models and customize video input processing without modifying core framework code, making the system more adaptable to future model additions and diverse use cases.

Highlights

Configurable Multimodal Placeholder Strings: Introduced a new CLI flag --mm-placeholder-str-override that allows users to specify custom placeholder strings for different modalities (e.g., video, image). This removes the need to hardcode these strings within the framework, simplifying the addition of new multimodal models.
Flexible Video Sampling Policies: Added a new CLI flag --video-media-io-kwargs which enables passing arbitrary keyword arguments to the video processing pipeline. This allows for flexible and extensible video sampling policies, moving beyond the previously hard-coded 32-frame configuration.
Core Configuration Updates: Updated ModelConfig and MultiModalConfig to include the new video_media_io_kwargs and mm_placeholder_str_override fields, ensuring these configurations are propagated throughout the system.
Media Connector Enhancements: Modified the MediaConnector to accept and utilize the new video_media_io_kwargs, ensuring that video fetching and processing can leverage the newly introduced flexible policies.
Video Loader API Extension: Extended the VideoLoader and VideoMediaIO interfaces to accept and pass through arbitrary keyword arguments, allowing custom video loaders to define and use their own specific parameters for video processing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the stated pain points by introducing configurable multimodal placeholder strings and flexible video sampling policies via CLI flags. The changes are well-integrated across the configuration, argument parsing, and multimodal utility layers. New test cases have been added to validate the parsing of the new CLI arguments and the propagation of video processing parameters. The overall code quality is good, with clear intent and appropriate use of dataclasses and type hints. One minor type mismatch was identified in the EngineArgs definition for mm_placeholder_str_override.

vllm/engine/arg_utils.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

vllm/multimodal/utils.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

tests/multimodal/test_utils.py

vllm/multimodal/utils.py

vllm/config.py

vllm/engine/arg_utils.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

DarkLight1337

LGTM, thanks for working on this! @ywang96 any further comments?

DarkLight1337 · 2025-06-28T07:18:42Z

Perhaps we can later work on a solution similar to #20179 to define the placeholder token inside the model class....

ywang96

Left a small nit but otherwise LGTM. Thanks for the contribution!

tests/multimodal/test_utils.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

huachenheli · 2025-06-29T03:58:46Z

Perhaps we can later work on a solution similar to #20179 to define the placeholder token inside the model class....

I suppose we could put this into BaseProcessingInfo which mm models inherit from, assuming new models have the correct processor setup.

DarkLight1337 · 2025-07-01T09:31:09Z

Can you merge from main to fix the CI?

huachenheli · 2025-07-01T18:34:39Z

Can you merge from main to fix the CI?

Just merged with main. Let's see if the timeout issue goes away.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

yeqcharlotte · 2025-07-03T01:33:02Z

@huachenheli also update the docs?

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

hmellor · 2025-07-14T08:16:36Z

@yeqcharlotte the CLI arg documentation has returned to https://docs.vllm.ai/en/latest/configuration/engine_args.html#multimodalconfig (it'll soon return to https://docs.vllm.ai/en/latest/cli/index.html too). Is this sufficient documentation for you?

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

huachenheli added 3 commits June 24, 2025 21:51

Support configurable number of video frames & multimodal placeholder …

9ab359c

…strings. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Switched to more generic video_media_io_kwargs.

17b71de

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Fixed tests/multimodal/test_utils.py

1ef2945

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

huachenheli marked this pull request as ready for review June 26, 2025 00:18

huachenheli requested review from DarkLight1337, ywang96 and aarnphm as code owners June 26, 2025 00:18

gemini-code-assist bot reviewed Jun 26, 2025

View reviewed changes

mergify bot added frontend multi-modality Related to multi-modality (#4194) labels Jun 26, 2025

gemini-code-assist bot reviewed Jun 26, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

huachenheli added 2 commits June 25, 2025 17:20

Fix typo.

6158ae1

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Added type ignore.

4e7f5c8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

DarkLight1337 added this to Multi-modality Core Jun 26, 2025

DarkLight1337 reviewed Jun 26, 2025

View reviewed changes

vllm/multimodal/utils.py Outdated Show resolved Hide resolved

ywang96 self-assigned this Jun 26, 2025

huachenheli force-pushed the huachenheli/mm_configs branch 2 times, most recently from 1d1f103 to aa4277e Compare June 26, 2025 04:37

huachenheli requested a review from DarkLight1337 June 26, 2025 04:40

huachenheli force-pushed the huachenheli/mm_configs branch 2 times, most recently from 70368b9 to da55875 Compare June 26, 2025 04:45

Address comments.

3f75b64

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

huachenheli force-pushed the huachenheli/mm_configs branch from da55875 to 3f75b64 Compare June 26, 2025 04:54

DarkLight1337 reviewed Jun 26, 2025

View reviewed changes

tests/multimodal/test_utils.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jun 26, 2025

View reviewed changes

vllm/multimodal/utils.py Outdated Show resolved Hide resolved

huachenheli requested review from simon-mo, WoosukKwon, youkaichao and robertgshaw2-redhat as code owners June 26, 2025 16:33

huachenheli requested a review from DarkLight1337 June 27, 2025 15:32

DarkLight1337 reviewed Jun 27, 2025

View reviewed changes

vllm/multimodal/utils.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jun 27, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jun 27, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jun 27, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

Address comments.

d97f40f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

huachenheli requested a review from DarkLight1337 June 27, 2025 15:52

DarkLight1337 approved these changes Jun 27, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Jun 28, 2025

Add GLM-4.1V model #19331

Merged

ywang96 approved these changes Jun 28, 2025

View reviewed changes

tests/multimodal/test_utils.py Show resolved Hide resolved

Update test.

83729d4

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

DarkLight1337 enabled auto-merge (squash) July 1, 2025 04:59

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 1, 2025

Merge branch 'main' into huachenheli/mm_configs

f82feef

Fix unit test broken by vllm-project#19331

4c57a9f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

auto-merge was automatically disabled July 1, 2025 20:36
Head branch was pushed to by a user without write access

vllm-bot merged commit 2e7cbf2 into vllm-project:main Jul 2, 2025
71 of 75 checks passed

github-project-automation bot moved this to Done in Multi-modality Core Jul 2, 2025

DarkLight1337 mentioned this pull request Jul 2, 2025

[Core] Move multimodal placeholder from chat utils to model definition #20355

Merged

4 tasks

huachenheli deleted the huachenheli/mm_configs branch July 2, 2025 15:47

huydhn pushed a commit to huydhn/vllm that referenced this pull request Jul 8, 2025

[Frontend] Support configurable mm placeholder strings & flexible vid…

b1e5aa3

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

[Frontend] Support configurable mm placeholder strings & flexible vid…

f6dad62

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Frontend] Support configurable mm placeholder strings & flexible vid…

6cc0f51

…eo sampling policies via CLI flags. (vllm-project#20105) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

Uh oh!

[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. #20105

[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. #20105

Conversation

huachenheli commented Jun 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Jun 28, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huachenheli commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jul 1, 2025

Uh oh!

huachenheli commented Jul 1, 2025

Uh oh!

Uh oh!

yeqcharlotte commented Jul 3, 2025

Uh oh!

hmellor commented Jul 14, 2025

Uh oh!

Uh oh!

huachenheli commented Jun 26, 2025 •

edited by github-actions bot

Loading

huachenheli commented Jun 29, 2025 •

edited

Loading