You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are documented under [Inferencing and Serving -> Production Metrics](https://docs.vllm.ai/en/stable/serving/metrics.html).
60
+
These are documented under [Inferencing and Serving -> Production Metrics](project:../../serving/metrics.md).
61
61
62
62
### Grafana Dashboard
63
63
64
-
vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
64
+
vLLM also provides [a reference example](project:../../getting_started/examples/prometheus_grafana.md) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
65
65
66
66
The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
67
67
@@ -80,15 +80,15 @@ The subset of metrics exposed in the Grafana dashboard gives us an indication of
80
80
-`vllm:request_decode_time_seconds` - Requests Decode Time
81
81
-`vllm:request_max_num_generation_tokens` - Max Generation Token in Sequence Group
82
82
83
-
See [the PR which added this Dashboard](https://github.com/vllm-project/vllm/pull/2316) for interesting and useful background on the choices made here.
83
+
See [the PR which added this Dashboard](gh-pr:2316) for interesting and useful background on the choices made here.
84
84
85
85
### Prometheus Client Library
86
86
87
-
Prometheus support was initially added [using the aioprometheus library](https://github.com/vllm-project/vllm/pull/1890), but a switch was made quickly to [prometheus_client](https://github.com/vllm-project/vllm/pull/2730). The rationale is discussed in both linked PRs.
87
+
Prometheus support was initially added [using the aioprometheus library](gh-pr:1890), but a switch was made quickly to [prometheus_client](gh-pr:2730). The rationale is discussed in both linked PRs.
88
88
89
89
### Multi-process Mode
90
90
91
-
In v0, metrics are collected in the engine core process and we use multi-process mode to make them available in the API server process. See [#7279](https://github.com/vllm-project/vllm/pull/7279).
91
+
In v0, metrics are collected in the engine core process and we use multi-process mode to make them available in the API server process. See <gh-pr:7279>.
92
92
93
93
### Built in Python/Process Metrics
94
94
@@ -114,32 +114,32 @@ vLLM instance.
114
114
115
115
For background, these are some of the relevant PRs which added the v0 metrics:
Also note the ["Even Better Observability"](https://github.com/vllm-project/vllm/issues/3616) feature where e.g. [a detailed roadmap was laid out](https://github.com/vllm-project/vllm/issues/3616#issuecomment-2030858781).
123
+
Also note the ["Even Better Observability"](gh-issue:3616) feature where e.g. [a detailed roadmap was laid out](gh-issue:3616#issuecomment-2030858781).
124
124
125
125
## v1 Design
126
126
127
127
### v1 PRs
128
128
129
129
For background, here are the relevant v1 PRs relating to the v1
0 commit comments