add SLA information into comparison graph for vLLM Benchmark Suite #25525

louie-tsai · 2025-09-24T00:49:03Z

Purpose

For better understand how TTFT and TPOT meet the defined SLA.
This PR label the related SLA in the diagram and highlight numbers meet SLA in green in the table.
it makes users easy to find the numbers meet the SLA.

also dump vllm environment information into vllm_env.txt

Test Plan

test manually

Test Result

with SLA TTFT 3000ms and TPOT 150 ms.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request adds SLA information to the vLLM benchmark comparison graphs and tables. The changes introduce new helper functions to draw limit lines on plots and highlight values in tables that meet the SLA. New command-line arguments are added to specify SLA thresholds. My review focuses on the implementation details of these new features. I've found a couple of issues: one related to file handling that could cause problems in CI, and another related to redundant code and logic in data sorting which affects correctness and maintainability. Please see my detailed comments.

.buildkite/nightly-benchmarks/scripts/compare-json-results.py

mergify · 2025-10-07T20:25:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @louie-tsai.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

louie-tsai · 2025-10-14T00:57:33Z

Random Dataset
128x128
128x2048
2048x128

bigPYJ1151 · 2025-10-22T06:37:20Z

Hi @louie-tsai looks like the DCO check needs fix.

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...

…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

mergify bot added ci/build performance Performance-related issues labels Sep 24, 2025

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

.buildkite/nightly-benchmarks/scripts/compare-json-results.py Outdated Show resolved Hide resolved

.buildkite/nightly-benchmarks/scripts/compare-json-results.py Show resolved Hide resolved

louie-tsai force-pushed the SLA_Graph branch 2 times, most recently from e515299 to 162bf05 Compare October 2, 2025 06:05

mergify bot added the needs-rebase label Oct 7, 2025

louie-tsai force-pushed the SLA_Graph branch 2 times, most recently from 1bb8bb5 to d6d0079 Compare October 9, 2025 22:34

mergify bot removed the needs-rebase label Oct 9, 2025

louie-tsai force-pushed the SLA_Graph branch 4 times, most recently from 4edceae to 7868429 Compare October 13, 2025 15:39

louie-tsai force-pushed the SLA_Graph branch 4 times, most recently from 3074ace to ba76dd7 Compare October 22, 2025 04:25

bigPYJ1151 approved these changes Oct 22, 2025

View reviewed changes

louie-tsai force-pushed the SLA_Graph branch from ba76dd7 to 39728f8 Compare October 23, 2025 06:24

louie-tsai added 8 commits October 22, 2025 23:26

quick fix

80a6fed

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

seperate model into different files

89d54aa

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

add SLA lines

1cd2a6f

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

highlight cell within SLA

e1bd425

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

sorted by x axis value

2500b6e

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

reduce TTFT SLA

28a696d

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

change to compare p99

b06aab2

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

make p99/median both available for latency

f80f0de

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

louie-tsai and others added 8 commits October 22, 2025 23:26

keep only 2 decimial

598edcb

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

fix for pre-commit

5c4b657

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

make the latency ratio >1

1bd8e3c

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Add vllm collect-env

ed7d3b2

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

add TP4 test cases according to findings from AWS benchmarking

e2aaa8d

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

change serving-test-cpu.json for R8i.24xlarge

92d187b

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

add two more use cases according to discussions

d831470

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

louie-tsai force-pushed the SLA_Graph branch from 39728f8 to d831470 Compare October 23, 2025 06:26

bigPYJ1151 enabled auto-merge (squash) October 23, 2025 06:41

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025

bigPYJ1151 merged commit 3b7bdf9 into vllm-project:main Oct 23, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add SLA information into comparison graph for vLLM Benchmark Suite #25525

add SLA information into comparison graph for vLLM Benchmark Suite #25525

Uh oh!

louie-tsai commented Sep 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 7, 2025

Uh oh!

louie-tsai commented Oct 14, 2025

Uh oh!

bigPYJ1151 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

add SLA information into comparison graph for vLLM Benchmark Suite #25525

add SLA information into comparison graph for vLLM Benchmark Suite #25525

Uh oh!

Conversation

louie-tsai commented Sep 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 7, 2025

Uh oh!

louie-tsai commented Oct 14, 2025

Uh oh!

bigPYJ1151 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

louie-tsai commented Sep 24, 2025 •

edited by github-actions bot

Loading