[Feature Request] Testing with defined prefix lengths

With the v1 engine of vLLM enabling prefix caching by default, we would need a way to consistently test this from the client side. vLLM built-in benchmarks already have support for this so that can serve as a sample. 

The throughput improvements as we know are considerable with a good hit rate. 

Linking to vLLM v1 blog: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html#:~:text=3.%20Zero%2DOverhead%20Prefix%20Caching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Testing with defined prefix lengths #104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Testing with defined prefix lengths #104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions