Description
Your current environment
image info
The latest pull request in the repository is "[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949)".
client start shell
sudo python3 ./bench_serving.py --backend vllm --dataset-name random --model deepseek-r1 --tokenizer ./tokenizer --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --random-input-len 6000 --random-output-len 1000 --random-range-ratio 1 --request-rate 16 --max-concurrency 16 --num-prompts 80 --base-url $BASE_URL --host 0.0.0.0 --port 8000 --profile
server start shell
VLLM_USE_V1=1 VLLM_TORCH_PROFILER_DIR=/disc vllm serve /root/.cache/huggingface --tensor-parallel-size 16 --trust-remote-code --gpu-memory-utilization 0.9 --max-model-len 32768 --enforce-eager --enable-reasoning --reasoning-parser deepseek_r1 --served-model-name deepseek-r1
error info
🐛 Describe the bug
in the first block
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.