vLLM in Docker

Note

For Kubernetes, see the vLLM production stack.

Usage

Prepare

cp tmp.env .env

Set HUGGING_FACE_HUB_TOKEN and other required fields.

Run

docker compose up -d

Test

curl -X POST "http://localhost:8080/vllms/qwen2.5-0.5b/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Add Model

Important

Refer to supported models and engine arguments.

Edit docker-compose.yml

...
services:
  proxy: <...>

+ deepseek-r1:
+   <<: *base-vllm
+   command:
+     - --task=generate
+     - --model=jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4
+     - --quantization=gptq
+     - --dtype=float16
+     - --max-model-len=65536
+     - --gpu-memory-utilization=0.8
...

Test Model

curl -X POST "http://localhost:8080/vllms/deepseek-r1/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Update prometheus.yml

...
scrape_configs:
  - job_name: vllm-job
    static_configs:
      - targets:
          - qwen2.5-0.5b:8000
          - user-bge-m3:8000
+         - deepseek-r1:8000
...

Basic Auth

Create Credentials

htpasswd -cb docker/nginx.htpasswd YOUR_USER YOUR_PASSWORD

Update docker-compose.yml

...
services:
  proxy:
    image: nginx:${NGINX_TAG:-1.27-alpine}
    volumes:
      - ./docker/nginx.conf:/etc/nginx/nginx.conf:ro
+     - ./docker/nginx.htpasswd:/etc/nginx/.htpasswd:ro
    ports:
      - 8080:8080
...

Update nginx.conf

...
http {
    resolver 127.0.0.11 valid=10s;

    server {
        listen 8080;

+       auth_basic "Restricted Area";
+       auth_basic_user_file /etc/nginx/.htpasswd;

        location ~ ^/vllms/(?<model>[^/]+?)/(?<path>.*)$ {
...

Scaling

Update docker-compose.yml

...
services:
  proxy: <...>

  qwen2.5-0.5b:
    <<: *base-vllm
+   scale: 2
    command:
      - --task=generate
      - --model=Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
...

Observability

Launch

docker compose -f docker-compose.obs.yml up -d

Access

Go to http://localhost:3000 (login: admin/admin).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docker		docker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.obs.yml		docker-compose.obs.yml
docker-compose.yml		docker-compose.yml
tmp.env		tmp.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM in Docker

Usage

Prepare

Run

Test

Add Model

Edit docker-compose.yml

Test Model

Update prometheus.yml

Basic Auth

Create Credentials

Update docker-compose.yml

Update nginx.conf

Scaling

Update docker-compose.yml

Observability

Launch

Access

About

Uh oh!

License

tech0ver/vLLM0ver

Folders and files

Latest commit

History

Repository files navigation

vLLM in Docker

Usage

Prepare

Run

Test

Add Model

Edit docker-compose.yml

Test Model

Update prometheus.yml

Basic Auth

Create Credentials

Update docker-compose.yml

Update nginx.conf

Scaling

Update docker-compose.yml

Observability

Launch

Access

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks