Closed as not planned
Description
I use benchmark_serving as client, api_server for vllm, text_generation_server for TGI, the client cmd is listed below:
" python benchmark_serving.py --backend tgi/vllm --tokenizer /data/llama --dataset /data/ShareGPT_V3_unfiltered_cleaned_split.json --host 10.3.1.2 --port 8108 --num-prompts 1000"
Why I get the result that TGI is 2x better than vllm?