generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 180
Closed
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What would you like to be added:
vLLM now support adding an optional cache_salt
which is used as a key to differentiate prefix caches if they have different salt but same prefix. This is a security feature.
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Here is a document with details about the world series: ..."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"cache_salt": "Z3V2bmV3aGxza3ZubGFoZ3Zud3V3ZWZ2bmd0b3V2bnZmc2xpZ3RoZ2x2aQ=="
}
The change would be to update the prefix aware scorers to be aware of the salt. To implement this, we can add the cache_salt
to the LLMRequest and make sure it's used to calculate prefix matches.
Why is this needed:
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.