Add support for vLLM KV-cache quantization #773

eldarkurtic · 2025-05-22T10:08:47Z

This PR enables using KV-cache quantization with vLLM backend by allowing users to trigger it with the expected vLLM args: kv_cache_dtype and calculate_kv_scales. More details about these two is available at https://docs.vllm.ai/en/stable/serving/engine_args.html#cacheconfig

HuggingFaceDocBuilderDev · 2025-05-22T13:14:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Add support for vLLM KV-cache quantization

d00a7c4

Merge branch 'main' into patch-2

e97c5b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for vLLM KV-cache quantization #773

Add support for vLLM KV-cache quantization #773

Uh oh!

eldarkurtic commented May 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for vLLM KV-cache quantization #773

Are you sure you want to change the base?

Add support for vLLM KV-cache quantization #773

Uh oh!

Conversation

eldarkurtic commented May 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants