Commit a2ec1f5
committed
[V0][Metrics] Deprecate some KV/prefix cache metrics
vllm:num_requests_swapped, vllm:cpu_cache_usage_perc and
vllm:cpu_prefix_cache_hit_rate will no longer be relevant in
V1 since we no longer implement KV cache offloading. So
these metrics should be considered deprecated.
And as agreed in vllm-project#12592, we have added prefix_cache_queries and
prefix_cache_hits counters to replace the prefix_cache_hit_rate
gauge as it allows the interval over which the hit rate is
calculated to be controlled in a Prometheus query like:
```
rate(prefix_cache_queries[5m]) / rate(prefix_cache_hits[5m])
```
In theory, we could ease the transition be implementing the
old hit rate metric in V1 and the new queries/hits metrics
in V0, but it's probably not worthwhile unless we learn the
hit rate metric is heavily used by V0 users.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>1 parent e584b85 commit a2ec1f5
1 file changed
+25
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
77 | 80 | | |
78 | 81 | | |
79 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
80 | 85 | | |
81 | 86 | | |
| 87 | + | |
82 | 88 | | |
83 | 89 | | |
84 | 90 | | |
85 | 91 | | |
86 | 92 | | |
87 | 93 | | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
88 | 97 | | |
89 | 98 | | |
90 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
91 | 102 | | |
92 | 103 | | |
93 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
94 | 107 | | |
95 | 108 | | |
96 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
97 | 112 | | |
98 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
99 | 117 | | |
100 | 118 | | |
101 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
102 | 122 | | |
103 | 123 | | |
104 | 124 | | |
| |||
0 commit comments