Replies: 1 comment
-
No, this is not supported unless you modify vLLM code |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a unique setup with a large document, many user might have questions for it.
Currently, i'm concatenating the
document || question
as prompt for each request. Which do benefit from prefix cache, but still requires some amount of overhead. i wonder if it's possible to havecache_reference_to_document || question
support, so it's more explicit and reduce overheadsBeta Was this translation helpful? Give feedback.
All reactions