Skip to content

Support vLLM cache salting #1631

@liu-cong

Description

@liu-cong

What would you like to be added:

vLLM now support adding an optional cache_salt which is used as a key to differentiate prefix caches if they have different salt but same prefix. This is a security feature.

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Here is a document with details about the world series: ..."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
  "cache_salt": "Z3V2bmV3aGxza3ZubGFoZ3Zud3V3ZWZ2bmd0b3V2bnZmc2xpZ3RoZ2x2aQ=="
}

vllm-project/vllm#17045

The change would be to update the prefix aware scorers to be aware of the salt. To implement this, we can add the cache_salt to the LLMRequest and make sure it's used to calculate prefix matches.

Why is this needed:

Metadata

Metadata

Assignees

Labels

needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions