Open
Description
The model to consider.
This model is based on the CLIP architecture, which is popular for image-text tasks.
The closest model vllm already supports.
Transformers model
What's your difficulty of supporting the model you want?
When I tried evaluating the model, I got the error
Code
from vllm import LLM
import torch
from PIL import Image
from transformers import CLIPProcessor
# Load the model
llm = LLM(model="patrickjohncyh/fashion-clip")
# Load and preprocess the image
image = Image.open("/Users/pkumari/Downloads/monitor.png")
processor = CLIPProcessor.from_pretrained("patrickjohncyh/fashion-clip")
inputs = processor(images=image, return_tensors="pt")
# Generate embeddings
outputs = llm.generate({
"prompt": "<image>",
"multi_modal_data": {"image": inputs['pixel_values']},
})
# Print the embeddings
print(outputs)
Error
[rank0]: ValueError: CLIPModel has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.
(search-eval) pkumari@GF4MX6XF00 search-eval % pip list
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo