Description
With the v1 engine of vLLM enabling prefix caching by default, we would need a way to consistently test this from the client side. vLLM built-in benchmarks already have support for this so that can serve as a sample.
The throughput improvements as we know are considerable with a good hit rate.
Linking to vLLM v1 blog: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html#:~:text=3.%20Zero%2DOverhead%20Prefix%20Caching
Metadata
Metadata
Type
Projects
Status
In progress