[Prefix Cache] Use LoRA name for consistent KV-cache block hashing #27211

sagiahrac · 2025-10-20T16:29:35Z

Purpose

When loading a new LoRA adapter, vLLM assigns it a unique LoRA integer ID using an atomic counter (ref 1, ref 2).
This ID is then included in the KV-Cache block hash along with the block tokens and other keys.
However, since the LoRA integer ID depends on the registration order, the resulting hashes are inconsistent across runs or instances — making it impossible to deterministically identify or share KV-Cache blocks between different vLLM instances.

This PR replaces the integer ID with the LoRA name in the hash calculation, making KV-Cache hashing consistent across instances and allowing reliable cache lookups, routing, and sharing.

Test Plan

Test LoRA name inclusion: Checks that when a LoRA request is active, the LoRA name (not ID) appears in the extra keys used for hashing.
Verified that all existing LoRA and base-model inference tests still pass.

Test Result

All updated tests pass.

Profiling

Performance testing shows negligible overhead when using LoRA names instead of integer IDs for KV-cache block hashing.

block size: 16
num blocks: 3125
total tokens per run: 3,125 × 16 = 50,000 tokens

=== System Information ===
Platform: macOS-15.6.1-arm64-arm-64bit
Processor: arm
Python version: 3.12.11
CPU count: 10
RAM: 64.0 GB
=========================


=== LoRA Key Type Profiling Summary ===
LoRA requests processed per run: 3,125
Profiling config: 1000 runs, 3125 requests/run, block_size=16
---------------------------------------
lora_string: mean=0.0010s, std=0.0000s
    Mean time per LoRA request: 0.00000031s
lora_int: mean=0.0010s, std=0.0000s
    Mean time per LoRA request: 0.00000031s
---------------------------------------
Comparison (relative performance):
    String names are 1.02x slower than int IDs
    Overhead: +0.0000s per 3,125 requests (+0.00000001s per request)
=======================================

code: https://gist.github.com/sagiahrac/dfa26f54f0514fbf8e1c7a99527cfb8b

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

github-actions · 2025-10-20T16:29:44Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request is a great improvement for ensuring deterministic KV-cache block hashing when using LoRA adapters. Replacing the non-deterministic integer ID with the LoRA name is a solid approach to enable reliable cache sharing across different vLLM instances. The added tests and performance profiling are appreciated and confirm the correctness and low overhead of the change.

I've added a couple of suggestions to make the implementation more robust by handling empty LoRA names, which could otherwise lead to cache collisions.

vllm/v1/core/kv_cache_utils.py

tests/v1/core/test_kv_cache_utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/core/kv_cache_utils.py

zhuohan123

Thanks for the contribution!

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

sagiahrac added 3 commits October 20, 2025 17:50

fix: Use LoRA name for consistent KV-cache block hashing

497be79

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

added test

7fb047e

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

lint

a2065c2

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagiahrac requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners October 20, 2025 16:29

mergify bot added the v1 label Oct 20, 2025

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Show resolved Hide resolved

tests/v1/core/test_kv_cache_utils.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Show resolved Hide resolved

Merge branch 'main' into lora-id-hash

7a429f6

zhuohan123 approved these changes Oct 21, 2025

View reviewed changes

zhuohan123 enabled auto-merge (squash) October 21, 2025 00:56

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025

sagiahrac and others added 2 commits October 21, 2025 10:17

Merge branch 'main' into lora-id-hash

2f70b85

Merge branch 'main' into lora-id-hash

1ffd644

zhuohan123 merged commit 1651003 into vllm-project:main Oct 22, 2025
46 checks passed

Kay-Tian mentioned this pull request Oct 23, 2025

vLLM PR #27211 变更核心文件提醒 Kay-Tian/vllm#14

Closed

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (v…

b63f560

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (v…

58a70b0

…llm-project#27211) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagiahrac mentioned this pull request Oct 27, 2025

[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction #27577

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing #27211

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing #27211

Uh oh!

sagiahrac commented Oct 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

zhuohan123 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing #27211

[Prefix Cache] Use LoRA name for consistent KV-cache block hashing #27211

Uh oh!

Conversation

sagiahrac commented Oct 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Profiling

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sagiahrac commented Oct 20, 2025 •

edited by github-actions bot

Loading