Support vLLM cache salting

**What would you like to be added**:

vLLM now support adding an optional `cache_salt` which is used as a key to differentiate prefix caches if they have different salt but same prefix. This is a security feature.

```
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Here is a document with details about the world series: ..."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
  "cache_salt": "Z3V2bmV3aGxza3ZubGFoZ3Zud3V3ZWZ2bmd0b3V2bnZmc2xpZ3RoZ2x2aQ=="
}
```

https://github.com/vllm-project/vllm/pull/17045

The change would be to update the prefix aware scorers  to be aware of the salt. To implement this, we can add the `cache_salt` to the LLMRequest and make sure it's used to calculate prefix matches.

**Why is this needed**:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support vLLM cache salting #1631

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support vLLM cache salting #1631

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions