-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][Metrics] Add per-request prompt/generation_tokens histograms #12516
[V1][Metrics] Add per-request prompt/generation_tokens histograms #12516
Conversation
Observe these values as requests are finished. Requires keeping track of generated tokens per-request to handle streaming delta updates. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@@ -116,9 +131,42 @@ def log(self, scheduler_stats: SchedulerStats, | |||
self.counter_generation_tokens.inc( | |||
iteration_stats.num_generation_tokens) | |||
|
|||
for finished_request in iteration_stats.finished_requests: | |||
self.histogram_num_prompt_tokens_request.observe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ, in V0, we did:
histogram.labels(**self.labels).observe(datum)
Line 541 in 0f657bd
histogram.labels(**self.labels).observe(datum) |
Do you know why this is or is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill hit automerge to unblock you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels()
is a pretty expensive factory method and our labels aren't ever changing, so I'm building these labelled metrics in the PrometheusStatLogger
constructor
self.histogram_num_generation_tokens_request = \
prometheus_client.Histogram(
name="vllm:request_generation_tokens",
documentation="Number of generation tokens processed.",
buckets=build_1_2_5_buckets(max_model_len),
labelnames=labelnames).labels(*labelvalues)
Not so easy to spot it though!
…lm-project#12516) Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Follow on from #12478, part of #10582
Observe these values as requests are finished.
Requires keeping track of generated tokens per-request to handle streaming delta updates.