Prepare processors for VideoLLMs #36149

zucchini-nlp · 2025-02-12T11:11:21Z

What does this PR do?

As discussed internally, these changes will help SmolVLM2 to be consistent with how processors usually work for other VLMs and lay ground for other video-LLMs in the future. The current API for chat templates should be more flexible now to handle special models

TODO:

allow video decoders to accept a callable
Tests to cover new cases (more tests should be added with SmolVLM's processor)

At the end we should have the below working before release. We might do a short cut by overriding apply_chat_template if some things fail on the way, and then refactor out after release

inputs = processor.apply_chat_template(history, num_frames=8, sampling_fps=1.0, skip_secs=1.0, return_tensors='pt', **kwargs)
model.generate(**inputs)

cc @molbap @yonigozlan

HuggingFaceDocBuilderDev · 2025-02-12T12:00:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap

Thanks for the quick iteration! Added a couple comments of things I'm not sure design or code-wise

src/transformers/processing_utils.py

src/transformers/image_utils.py

src/transformers/processing_utils.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

molbap

LGTM, @qubvel if you want to take a look as well!

src/transformers/processing_utils.py

molbap · 2025-02-13T11:05:05Z

src/transformers/image_utils.py

-                f"Make sure that fps of a video is less than the requested fps for loading. Detected video_fps={video_fps}"
-            )
-    indices = get_uniform_frame_indices(total_num_frames, num_frames=num_frames)
+    duration = total_num_frames / video_fps if video_fps else 0


random thought, what does it mean to have a duration 0 video?

An error occured within video decoder and it couldn't give us back the duration. Rarely that can happen

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

qubvel

Thanks! Made a quick look and left a coment, but it's up to you

src/transformers/image_utils.py

zucchini-nlp added 2 commits February 12, 2025 12:08

allow processor to preprocess conversation + video metadata

d122282

allow callable

14a83c8

add test

9791742

zucchini-nlp changed the title ~~[WIP] Prepare processors for VideoLLMs~~ Prepare processors for VideoLLMs Feb 12, 2025

zucchini-nlp requested a review from molbap February 12, 2025 12:31

zucchini-nlp added 5 commits February 12, 2025 13:59

fix test

03b6a30

Merge branch 'main' into video_decoders

942c72d

nit: fix

5aea483

Merge branch 'main' into video_decoders

9d3c36d

add metadata frames_indices

890ff90

molbap reviewed Feb 12, 2025

View reviewed changes

zucchini-nlp and others added 4 commits February 13, 2025 10:15

Update src/transformers/processing_utils.py

db50996

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

Update src/transformers/processing_utils.py

d4d5b08

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

port updates from Orr and add one more test

66ea798

Merge branch 'main' into video_decoders

584f5e9

molbap approved these changes Feb 13, 2025

View reviewed changes

zucchini-nlp and others added 3 commits February 13, 2025 12:12

Update src/transformers/processing_utils.py

bab030e

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

typo

754f344

Merge branch 'main' into video_decoders

6698a86

qubvel reviewed Feb 13, 2025

View reviewed changes

src/transformers/image_utils.py Outdated Show resolved Hide resolved

src/transformers/image_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 7 commits February 13, 2025 12:39

as dataclass

f7a47fd

style

ee76845

Merge branch 'main' into video_decoders

c3fa1f0

Merge branch 'main' into video_decoders

b3610a0

Merge branch 'main' into video_decoders

c084afa

docstring + maek sure tests green

ee4fee6

Merge branch 'main' into video_decoders

a639520

zucchini-nlp merged commit 15ec971 into huggingface:main Feb 14, 2025
25 checks passed

MilkClouds mentioned this pull request Mar 14, 2025

return_assistant_tokens_mask argument is blocked in ProcessorMixin.apply_chat_template #36713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepare processors for VideoLLMs #36149

Prepare processors for VideoLLMs #36149

Uh oh!

zucchini-nlp commented Feb 12, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2025

Uh oh!

molbap left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap left a comment

Uh oh!

Uh oh!

molbap Feb 13, 2025

Uh oh!

zucchini-nlp Feb 13, 2025

Uh oh!

qubvel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Prepare processors for VideoLLMs #36149

Prepare processors for VideoLLMs #36149

Uh oh!

Conversation

zucchini-nlp commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2025

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Feb 12, 2025 •

edited

Loading