You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement the new feature to support a pipeline that can take both an image and text as inputs, and produce a text output. This would be particularly useful for multi-modal tasks such as visual question answering (VQA), image captioning, or image-based text generation.
fromtransformersimportpipeline# Initialize the pipeline with multi-modal modelsmulti_modal_pipeline=pipeline("image-text-to-text", model="meta-llama/Llama-3.2-11B-Vision-Instruct")
# Example usagemessages= [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
]}
]
result=multi_modal_pipeline(messages )
print(result) # Should return an answer or relevant text based on the image and question
Motivation
Simplifies workflows involving multi-modal data.
Enables more complex and realistic tasks to be handled with existing Transformer models.
Encourages more multi-modal model usage in research and production.
Your contribution
Transformers Integration
Ensure that the pipeline works well within the Hugging Face Transformers library:
Implement the custom pipeline class (ImageTextToTextPipeline).
Add support for handling different data types (image, text) and ensure smooth forward pass execution.
classImageTextToTextPipeline(Pipeline):
....
The text was updated successfully, but these errors were encountered:
Feature request
Implement the new feature to support a pipeline that can take both an image and text as inputs, and produce a text output. This would be particularly useful for multi-modal tasks such as visual question answering (VQA), image captioning, or image-based text generation.
Motivation
Your contribution
Transformers Integration
Ensure that the pipeline works well within the Hugging Face Transformers library:
ImageTextToTextPipeline
).The text was updated successfully, but these errors were encountered: