-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Prepare processors for VideoLLMs #36149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick iteration! Added a couple comments of things I'm not sure design or code-wise
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @qubvel if you want to take a look as well!
| f"Make sure that fps of a video is less than the requested fps for loading. Detected video_fps={video_fps}" | ||
| ) | ||
| indices = get_uniform_frame_indices(total_num_frames, num_frames=num_frames) | ||
| duration = total_num_frames / video_fps if video_fps else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random thought, what does it mean to have a duration 0 video?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error occured within video decoder and it couldn't give us back the duration. Rarely that can happen
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Made a quick look and left a coment, but it's up to you
What does this PR do?
As discussed internally, these changes will help SmolVLM2 to be consistent with how processors usually work for other VLMs and lay ground for other video-LLMs in the future. The current API for chat templates should be more flexible now to handle special models
TODO:
At the end we should have the below working before release. We might do a short cut by overriding
apply_chat_templateif some things fail on the way, and then refactor out after releasecc @molbap @yonigozlan