Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed inference example for llava_next #3179

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

VladOS95-cyber
Copy link

@VladOS95-cyber VladOS95-cyber commented Oct 20, 2024

Add distributed inference example for LLaVA-NeXT-Video-7B-hf

Before submitting

Who can review?

@sayakpaul @a-r-r-o-w @muellerzr

@VladOS95-cyber
Copy link
Author

Hi @sayakpaul @a-r-r-o-w! This PR is ready for review, please, take a look.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this so quickly. Left some comments.

indices = np.arange(0, total_frames, total_frames / 8).astype(int)
video = read_video_pyav(container, indices)

conversations = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could repurpose this to just yield captions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sayakpaul! What do you mean by repurpose here? The idea of this example was to provide user prompts (Questions) to certain videos and process it. If we want to get just captions, what exactly are we going to do with it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train text-to-video models, for one

Copy link
Author

@VladOS95-cyber VladOS95-cyber Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayakpaul, please, take a look on recent changes, what do you think?

Copy link
Author

@VladOS95-cyber VladOS95-cyber Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sayakpaul! Do you have any comments on recent changes? I added video dataset loading and splitting it into batches, and then, on every video batch, we distribute prepared prompts to generate answers.

examples/inference/distributed/llava_next_video.py Outdated Show resolved Hide resolved
examples/inference/distributed/llava_next_video.py Outdated Show resolved Hide resolved
examples/inference/distributed/llava_next_video.py Outdated Show resolved Hide resolved
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@VladOS95-cyber VladOS95-cyber force-pushed the add-video-capture-example-on-distributed-inference branch from 91ee412 to 0a58faa Compare October 24, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants