vLLM + Prometheus/Grafana

This is a simple example that shows you how to connect vLLM metric logging to the Prometheus/Grafana stack. For this example, we launch Prometheus and Grafana via Docker. You can checkout other methods through Prometheus and Grafana websites.

Install:

docker
docker compose

Launch

Prometheus metric logging is enabled by default in the OpenAI-compatible server. Launch via the entrypoint:

vllm serve mistralai/Mistral-7B-v0.1 \
    --max-model-len 2048 \
    --disable-log-requests

Launch Prometheus and Grafana servers with docker compose:

docker compose up

Submit some sample requests to the server:

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

python3 ../../benchmarks/benchmark_serving.py \
    --model mistralai/Mistral-7B-v0.1 \
    --tokenizer mistralai/Mistral-7B-v0.1 \
    --endpoint /v1/completions \
    --dataset-name sharegpt \
    --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
    --request-rate 3.0

Navigating to http://localhost:8000/metrics will show the raw Prometheus metrics being exposed by vLLM.

Grafana Dashboard

Navigate to http://localhost:3000. Log in with the default username (admin) and password (admin).

Add Prometheus Data Source

Navigate to http://localhost:3000/connections/datasources/new and select Prometheus.

On Prometheus configuration page, we need to add the Prometheus Server URL in Connection. For this setup, Grafana and Prometheus are running in separate containers, but Docker creates DNS name for each containers. You can just use http://prometheus:9090.

Click Save & Test. You should get a green check saying "Successfully queried the Prometheus API.".

Import Dashboard

Navigate to http://localhost:3000/dashboard/import, upload grafana.json, and select the prometheus datasource. You should see a screen that looks like the following:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!