Image-Text-to-Text Support in Transformers Pipeline #34169

chakravarthik27 · 2024-10-15T07:50:21Z

Feature request

Implement the new feature to support a pipeline that can take both an image and text as inputs, and produce a text output. This would be particularly useful for multi-modal tasks such as visual question answering (VQA), image captioning, or image-based text generation.

from transformers import pipeline

# Initialize the pipeline with multi-modal models
multi_modal_pipeline = pipeline("image-text-to-text", model="meta-llama/Llama-3.2-11B-Vision-Instruct")

# Example usage
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
result = multi_modal_pipeline(messages )
print(result)  # Should return an answer or relevant text based on the image and question

Motivation

Simplifies workflows involving multi-modal data.
Enables more complex and realistic tasks to be handled with existing Transformer models.
Encourages more multi-modal model usage in research and production.

Your contribution

Transformers Integration
Ensure that the pipeline works well within the Hugging Face Transformers library:

Implement the custom pipeline class (ImageTextToTextPipeline).
Add support for handling different data types (image, text) and ensure smooth forward pass execution.

class ImageTextToTextPipeline(Pipeline):
  ....

The text was updated successfully, but these errors were encountered:

yonigozlan · 2024-10-15T09:42:29Z

Good timing ;) #34170

NOOB-del-ai · 2024-10-17T01:45:12Z

I'd like to work on this feature request.

chakravarthik27 added the Feature request Request for a new feature label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image-Text-to-Text Support in Transformers Pipeline #34169

Image-Text-to-Text Support in Transformers Pipeline #34169

chakravarthik27 commented Oct 15, 2024

yonigozlan commented Oct 15, 2024

NOOB-del-ai commented Oct 17, 2024

Image-Text-to-Text Support in Transformers Pipeline #34169

Image-Text-to-Text Support in Transformers Pipeline #34169

Comments

chakravarthik27 commented Oct 15, 2024

Feature request

Motivation

Your contribution

yonigozlan commented Oct 15, 2024

NOOB-del-ai commented Oct 17, 2024