Skip to content

Comments

[misc] Add Benchmark Automation Script#73

Merged
zhuofan1123 merged 11 commits intomainfrom
feat/scripts
Dec 4, 2025
Merged

[misc] Add Benchmark Automation Script#73
zhuofan1123 merged 11 commits intomainfrom
feat/scripts

Conversation

@zhuofan1123
Copy link
Collaborator

@zhuofan1123 zhuofan1123 commented Dec 2, 2025

Add FlexKV Benchmark Automation Script

This PR add an automated benchmark script (run_benchmark.sh) for running vLLM server with FlexKV and executing multi-turn conversation benchmarks. See scripts/README_zh.md.

Key Features

  • End-to-end automation: Dataset preparation, server launch, benchmark execution, and cleanup
  • Flexible configuration: Supports customizable vLLM, FlexKV, and benchmark parameters
  • Optional profiling: Integrated Nsight Systems profiling support
  • Comprehensive logging: Server logs, benchmark results, and profiling reports with timestamps

Usage

bash scripts/run_benchmark.sh --vllm-path <path> --model-path <path> [options]

Output

Generated log files in the log directory:

  • vllm_server_YYYYMMDD_HHMMSS.log
  • benchmark_YYYYMMDD_HHMMSS.log
  • vllm_profile_YYYYMMDD_HHMMSS.nsys-rep (if profiling enabled)

Copy link
Collaborator

@linhu-nv linhu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Collaborator

@peaceforeverCN peaceforeverCN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhuofan1123 zhuofan1123 merged commit 17180d1 into main Dec 4, 2025
0 of 2 checks passed
Luis-xu pushed a commit to peaceforeverCN/FlexKV that referenced this pull request Dec 26, 2025
* add KVCacheEngineClient APIs

* basic implementation for KVCacheEngineClient

* initial transfer manager

* init transfer handle

* init kv engine

* refactor kvmanager

* update kvmanager

* some refactor

* kv response

* add benchmark

* serialize graph

* fix bugs

* ready check

* update

* rename

* rename benchmark

* use numpy instead of tensor

* small fix

* remove transfer descriptor

* rename to kvmanager

* update api

* add gpu-kvcache-verifier, draft

* update

* create a new tp-worker process and create gpu blocks for verification

* rename

* the test_kvmanager works now

* fix virtual op initialize

* fix verifier bug when tp > 1 and mla enabled

* fix

* remove task id && some fix

* only create one h2d op

* pass slotmapping for launch

* quick fix

---------

Co-authored-by: linhu-nv <linhu@nvidia.com>
Co-authored-by: Fei Liang <hanyueh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants