Skip to content

sebastianschramm/benchmarks_vllm_NIM

Repository files navigation

Benchmarks of request throughput of vllm and NVIDIA NIM

Using vllm benchmark script I evaluated llama-3-8B-instruct served with vllm and with NVIDIA NIM and deployed on AWS via HuggingFace Endpoints on two different instance types - L4 and A10G.

Results

For the shareGPT dataset on A10G, vllm has 18% higher request throughput compare to NIM.

Average over 2 runs:

Hardware vllm NIM
L4 2.73 2.61
A10G 4.13 3.49

Request throughput comparison

The repo contains benchmark JSON files for more details.

About

Benchmarks of request throughput of vllm and NVIDIA NIM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published