Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image-Text-to-Text Support in Transformers Pipeline #34169

Open
chakravarthik27 opened this issue Oct 15, 2024 · 2 comments
Open

Image-Text-to-Text Support in Transformers Pipeline #34169

chakravarthik27 opened this issue Oct 15, 2024 · 2 comments
Labels
Feature request Request for a new feature

Comments

@chakravarthik27
Copy link

Feature request

Implement the new feature to support a pipeline that can take both an image and text as inputs, and produce a text output. This would be particularly useful for multi-modal tasks such as visual question answering (VQA), image captioning, or image-based text generation.

from transformers import pipeline

# Initialize the pipeline with multi-modal models
multi_modal_pipeline = pipeline("image-text-to-text", model="meta-llama/Llama-3.2-11B-Vision-Instruct")

# Example usage
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
result = multi_modal_pipeline(messages )
print(result)  # Should return an answer or relevant text based on the image and question

Motivation

  • Simplifies workflows involving multi-modal data.
  • Enables more complex and realistic tasks to be handled with existing Transformer models.
  • Encourages more multi-modal model usage in research and production.

Your contribution

Transformers Integration
Ensure that the pipeline works well within the Hugging Face Transformers library:

  • Implement the custom pipeline class (ImageTextToTextPipeline).
  • Add support for handling different data types (image, text) and ensure smooth forward pass execution.
class ImageTextToTextPipeline(Pipeline):
  ....
  
@chakravarthik27 chakravarthik27 added the Feature request Request for a new feature label Oct 15, 2024
@yonigozlan
Copy link
Member

Good timing ;) #34170

@NOOB-del-ai
Copy link

I'd like to work on this feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants