Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Feb 12, 2025

What does this PR do?

As discussed internally, these changes will help SmolVLM2 to be consistent with how processors usually work for other VLMs and lay ground for other video-LLMs in the future. The current API for chat templates should be more flexible now to handle special models

TODO:

  • allow video decoders to accept a callable
  • Tests to cover new cases (more tests should be added with SmolVLM's processor)

At the end we should have the below working before release. We might do a short cut by overriding apply_chat_template if some things fail on the way, and then refactor out after release

inputs = processor.apply_chat_template(history, num_frames=8, sampling_fps=1.0, skip_secs=1.0, return_tensors='pt', **kwargs)
model.generate(**inputs)

cc @molbap @yonigozlan

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Prepare processors for VideoLLMs Prepare processors for VideoLLMs Feb 12, 2025
@zucchini-nlp zucchini-nlp requested a review from molbap February 12, 2025 12:31
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick iteration! Added a couple comments of things I'm not sure design or code-wise

zucchini-nlp and others added 4 commits February 13, 2025 10:15
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @qubvel if you want to take a look as well!

f"Make sure that fps of a video is less than the requested fps for loading. Detected video_fps={video_fps}"
)
indices = get_uniform_frame_indices(total_num_frames, num_frames=num_frames)
duration = total_num_frames / video_fps if video_fps else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought, what does it mean to have a duration 0 video?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occured within video decoder and it couldn't give us back the duration. Rarely that can happen

zucchini-nlp and others added 3 commits February 13, 2025 12:12
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Made a quick look and left a coment, but it's up to you

@zucchini-nlp zucchini-nlp merged commit 15ec971 into huggingface:main Feb 14, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants