Skip to content

[Feature] Support image input with OpenAI-compatible model endpoint #1777

Closed as not planned
@peterschmidt85

Description

@peterschmidt85

Steps to reproduce

  1. Run Llama 3.2
type: service
name: llama32

image: vllm/vllm-openai:latest
env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct
  - MAX_MODEL_LEN=4096
  - MAX_NUM_SEQS=8
commands:
  - vllm serve $MODEL_ID
    --max-model-len $MAX_MODEL_LEN
    --max-num-seqs $MAX_NUM_SEQS
    --enforce-eager
    --disable-log-requests
    --limit-mm-per-prompt "image=1"
    --tensor-parallel-size $DSTACK_GPUS_NUM
port: 8000
model: meta-llama/Llama-3.2-11B-Vision-Instruct

resources:
  gpu: 40GB..48GB
  1. Access via model endpoint
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer token' \
    --data '{
        "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
        "messages": [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "Describe the image."},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/e/ea/Bento_at_Hanabishi%2C_Koyasan.jpg"}}
            ]
        }],
        "max_tokens": 2048
    }'

It doesn't work and throws an error.

  1. Access the service endpoint:
curl http://127.0.0.1:3000/proxy/services/main/llama32/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer token' \
    --data '{
        "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
        "messages": [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "Describe the image."},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/e/ea/Bento_at_Hanabishi%2C_Koyasan.jpg"}}
            ]
        }],
        "max_tokens": 2048
    }'

It works.

Proxy and gateway should support vision requests too (in addition to normal requests).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions