Skip to content

[Feature Request] Testing with defined prefix lengths #104

@thameem-abbas

Description

@thameem-abbas

With the v1 engine of vLLM enabling prefix caching by default, we would need a way to consistently test this from the client side. vLLM built-in benchmarks already have support for this so that can serve as a sample.

The throughput improvements as we know are considerable with a good hit rate.

Linking to vLLM v1 blog: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html#:~:text=3.%20Zero%2DOverhead%20Prefix%20Caching

Metadata

Metadata

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions