[Feature] Support image input with OpenAI-compatible model endpoint

## Steps to reproduce

1. Run Llama 3.2

```
type: service
name: llama32

image: vllm/vllm-openai:latest
env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct
  - MAX_MODEL_LEN=4096
  - MAX_NUM_SEQS=8
commands:
  - vllm serve $MODEL_ID
    --max-model-len $MAX_MODEL_LEN
    --max-num-seqs $MAX_NUM_SEQS
    --enforce-eager
    --disable-log-requests
    --limit-mm-per-prompt "image=1"
    --tensor-parallel-size $DSTACK_GPUS_NUM
port: 8000
model: meta-llama/Llama-3.2-11B-Vision-Instruct

resources:
  gpu: 40GB..48GB
```

2. Access via model endpoint

```
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer token' \
    --data '{
        "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
        "messages": [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "Describe the image."},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/e/ea/Bento_at_Hanabishi%2C_Koyasan.jpg"}}
            ]
        }],
        "max_tokens": 2048
    }'
```

It doesn't work and throws an error.

3. Access the service endpoint:

```
curl http://127.0.0.1:3000/proxy/services/main/llama32/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer token' \
    --data '{
        "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
        "messages": [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "Describe the image."},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/e/ea/Bento_at_Hanabishi%2C_Koyasan.jpg"}}
            ]
        }],
        "max_tokens": 2048
    }'
```

It works.

Proxy and gateway should support vision requests too (in addition to normal requests).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support image input with OpenAI-compatible model endpoint #1777

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support image input with OpenAI-compatible model endpoint #1777

Description

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions