Skip to content

[New Model]: support for fashion-clip #16019

Open
@priyankaiiit14

Description

@priyankaiiit14

The model to consider.

This model is based on the CLIP architecture, which is popular for image-text tasks.

The closest model vllm already supports.

Transformers model

What's your difficulty of supporting the model you want?

When I tried evaluating the model, I got the error

Code

from vllm import LLM
import torch
from PIL import Image
from transformers import CLIPProcessor
# Load the model
llm = LLM(model="patrickjohncyh/fashion-clip")
# Load and preprocess the image
image = Image.open("/Users/pkumari/Downloads/monitor.png")
processor = CLIPProcessor.from_pretrained("patrickjohncyh/fashion-clip")
inputs = processor(images=image, return_tensors="pt")
# Generate embeddings
outputs = llm.generate({
    "prompt": "<image>",
    "multi_modal_data": {"image": inputs['pixel_values']},
})
# Print the embeddings
print(outputs)

Error

[rank0]: ValueError: CLIPModel has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.
(search-eval) pkumari@GF4MX6XF00 search-eval % pip list

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-modelRequests to new models

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions