-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
add SLA information into comparison graph for vLLM Benchmark Suite #25525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds SLA information to the vLLM benchmark comparison graphs and tables. The changes introduce new helper functions to draw limit lines on plots and highlight values in tables that meet the SLA. New command-line arguments are added to specify SLA thresholds. My review focuses on the implementation details of these new features. I've found a couple of issues: one related to file handling that could cause problems in CI, and another related to redundant code and logic in data sorting which affects correctness and maintainability. Please see my detailed comments.
e515299 to
162bf05
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
1bb8bb5 to
d6d0079
Compare
4edceae to
7868429
Compare
|
Random Dataset |
3074ace to
ba76dd7
Compare
|
Hi @louie-tsai looks like the DCO check needs fix. |
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...
…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#25525) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
For better understand how TTFT and TPOT meet the defined SLA.
This PR label the related SLA in the diagram and highlight numbers meet SLA in green in the table.
it makes users easy to find the numbers meet the SLA.
also dump vllm environment information into vllm_env.txt
Test Plan
test manually
Test Result
with SLA TTFT 3000ms and TPOT 150 ms.

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.