|  | 
|  | 1 | +# GuideLLM Benchmark Testing Best Practice | 
|  | 2 | + | 
|  | 3 | +Do first easy-go guidellm benchmark testing from scratch using vLLM Simulator. | 
|  | 4 | + | 
|  | 5 | +## Getting Started | 
|  | 6 | + | 
|  | 7 | +### 📦 1. Benchmark Testing Environment Setup | 
|  | 8 | + | 
|  | 9 | +#### 1.1 Create a Conda Environment (recommended) | 
|  | 10 | + | 
|  | 11 | +```bash | 
|  | 12 | +conda create -n guidellm-bench python=3.11 -y | 
|  | 13 | +conda activate guidellm-bench | 
|  | 14 | +``` | 
|  | 15 | + | 
|  | 16 | +#### 1.2 Install Dependencies | 
|  | 17 | + | 
|  | 18 | +```bash | 
|  | 19 | +git clone https://github.com/vllm-project/guidellm.git | 
|  | 20 | +cd guidellm | 
|  | 21 | +pip install guidellm | 
|  | 22 | +``` | 
|  | 23 | + | 
|  | 24 | +For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md). | 
|  | 25 | + | 
|  | 26 | +#### 1.3 Verify Installation | 
|  | 27 | + | 
|  | 28 | +```bash | 
|  | 29 | +guidellm --help | 
|  | 30 | +``` | 
|  | 31 | + | 
|  | 32 | +#### 1.4 Startup OpenAI-compatible API in vLLM simulator docker container | 
|  | 33 | + | 
|  | 34 | +```bash | 
|  | 35 | +docker pull ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 | 
|  | 36 | + | 
|  | 37 | +docker run --rm --publish 8000:8000 \ | 
|  | 38 | +ghcr.io/llm-d/llm-d-inference-sim:v0.4.0  \ | 
|  | 39 | +--port 8000 \ | 
|  | 40 | +--model "Qwen/Qwen2.5-1.5B-Instruct"  \ | 
|  | 41 | +--lora-modules '{"name":"tweet-summary-0"}' '{"name":"tweet-summary-1"}' | 
|  | 42 | +``` | 
|  | 43 | + | 
|  | 44 | +For more detailed instructions, refer to: [vLLM Simulator](https://llm-d.ai/docs/architecture/Components/inference-sim) | 
|  | 45 | + | 
|  | 46 | +Docker image versions: [Docker Images](https://github.com/llm-d/llm-d-inference-sim/pkgs/container/llm-d-inference-sim) | 
|  | 47 | + | 
|  | 48 | +Check open-ai api working via curl: | 
|  | 49 | + | 
|  | 50 | +- check /v1/models | 
|  | 51 | + | 
|  | 52 | +```bash | 
|  | 53 | +curl --request GET 'http://localhost:8000/v1/models' | 
|  | 54 | +``` | 
|  | 55 | + | 
|  | 56 | +- check /v1/chat/completions | 
|  | 57 | + | 
|  | 58 | +```bash | 
|  | 59 | +curl --request POST 'http://localhost:8000/v1/chat/completions' \ | 
|  | 60 | +--header 'Content-Type: application/json' \ | 
|  | 61 | +--data-raw '{ | 
|  | 62 | +    "model": "tweet-summary-0", | 
|  | 63 | +    "stream": false, | 
|  | 64 | +    "messages": [{"role": "user", "content": "Say this is a test!"}] | 
|  | 65 | +}' | 
|  | 66 | +``` | 
|  | 67 | + | 
|  | 68 | +- check /v1/completions | 
|  | 69 | + | 
|  | 70 | +```bash | 
|  | 71 | +curl --request POST 'http://localhost:8000/v1/completions' \ | 
|  | 72 | +--header 'Content-Type: application/json' \ | 
|  | 73 | +--data-raw '{ | 
|  | 74 | +    "model": "tweet-summary-0", | 
|  | 75 | +    "stream": false, | 
|  | 76 | +    "prompt": "Say this is a test!", | 
|  | 77 | +    "max_tokens": 128 | 
|  | 78 | +}' | 
|  | 79 | +``` | 
|  | 80 | + | 
|  | 81 | +#### 1.5 Download Tokenizer | 
|  | 82 | + | 
|  | 83 | +Download Qwen/Qwen2.5-1.5B-Instruct tokenizer files from [Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct/files) save to local path such as ${local_path}/Qwen2.5-1.5B-Instruct | 
|  | 84 | + | 
|  | 85 | +```bash | 
|  | 86 | +ls ./Qwen2.5-1.5B-Instruct | 
|  | 87 | +merges.txt              tokenizer.json          tokenizer_config.json   vocab.json | 
|  | 88 | +``` | 
|  | 89 | + | 
|  | 90 | +______________________________________________________________________ | 
|  | 91 | + | 
|  | 92 | +## 🚀 2. Running Benchmarks | 
|  | 93 | + | 
|  | 94 | +```bash | 
|  | 95 | +guidellm benchmark \ | 
|  | 96 | +--target "http://localhost:8000/" \ | 
|  | 97 | +--model "tweet-summary-0" \ | 
|  | 98 | +--processor "${local_path}/Qwen2.5-1.5B-Instruct" \ | 
|  | 99 | +--rate-type sweep \ | 
|  | 100 | +--max-seconds 10 \ | 
|  | 101 | +--max-requests 10 \ | 
|  | 102 | +--data "prompt_tokens=128,output_tokens=56" | 
|  | 103 | +``` | 
|  | 104 | + | 
|  | 105 | +______________________________________________________________________ | 
|  | 106 | + | 
|  | 107 | +## 📊 3. Results Interpretation | 
|  | 108 | + | 
|  | 109 | +   | 
|  | 110 | + | 
|  | 111 | +After the benchmark completes, key results are clear and straightforward, such as: | 
|  | 112 | + | 
|  | 113 | +- **`TTFT`**: Time to First Token | 
|  | 114 | +- **`TPOT`**: Time Per Output Token | 
|  | 115 | +- **`ITL`**: Inter-Token Latency | 
|  | 116 | + | 
|  | 117 | +The first benchmark test complete. | 
0 commit comments