Skip to content

TGI performance is better than vllm on A800 #262

Closed as not planned
Closed as not planned
@jameswu2014

Description

@jameswu2014

I use benchmark_serving as client, api_server for vllm, text_generation_server for TGI, the client cmd is listed below:
" python benchmark_serving.py --backend tgi/vllm --tokenizer /data/llama --dataset /data/ShareGPT_V3_unfiltered_cleaned_split.json --host 10.3.1.2 --port 8108 --num-prompts 1000"

Why I get the result that TGI is 2x better than vllm?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions