Open
Description
Improve on https://github.com/tenstorrent/tt-inference-server/blob/main/benchmarking/vllm_online_benchmark.py and add a script that handles full setup, e.g. as described in https://gist.github.com/milank94/1b7c31556f5f6a13c56553e2cccc5823
Requirements
- single bash command to setup and run benchmarks (vLLM server + client side requests)
- can be run either in tt-inference-server Docker image, or from an existing tt-metal development environment
- output is stored so that each run can be found and data extracted easily
- can parameterize over MESH_DEVICE={N150, N300, T3K} as supported by tt-transformers
Activity