[Feat] Benchmark trimming #1
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Support benchmark trimming with user-define time interval to get accurate decode related metrics.
VLLM's current benchmark does not support a trimming feature, which is essential for obtaining precise decode-related metrics. At the beginning of the benchmark, the vLLM inference server cannot fully utilize its decode capacity because the nature of prefill prevents it from processing large batches immediately. This means some time is required for the server to reach a fully loaded state for decoding. Similarly, at the end of the benchmark, as requests finish gradually, the server utilization drops. However, the vLLM benchmark cannot distinguish between this low-load data and the stable, high-load data, which corrupts the resulting decode metrics.
We have added
--warmup-timeand--cooldown-timeoptions to support configuring the effective time interval for measurement.The accurate time interval is calculated by the following formula: (Effective Time interval) = (Benchmark End Time (E) - Cooldown Time (c)) - (Benchmark Start Time (S) + Warmup Time (w))
Test Plan
Test Result
The trimmed benchmark result is added at the bottom of the benchmark results. We can get
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.