Skip to content

Conversation

@zucchini-nlp
Copy link
Member

What does this PR do?

Enables vllm-project/vllm#30680 from transformers side. I had this in my local draft for a very long time and some models still don't fit neatly, such as InternVL uses image token-id per video-frame processing or SmolVLM adds timestamps between each frame

I think we will support models that are easy first, in progress for now.

fyi @hmellor

@zucchini-nlp zucchini-nlp changed the title Video mm tokens Video support in vLLM backend Dec 17, 2025
@zucchini-nlp zucchini-nlp changed the title Video support in vLLM backend [WIP] Video support in vLLM backend Dec 17, 2025
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: blip_2, got_ocr2, instructblip, instructblipvideo, internvl, llava_next_video, llava_onevision, perception_lm, qwen2_5_omni, qwen2_5_vl, qwen2_vl, smolvlm, video_llava

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42919&sha=ce0dfb

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants