Open
Description
Hi, I'm trying understanding the work flow of vLLM and I'm intersting in Prefix Caching.
So, I want to know the conditions of prefix caching and who (Scheduler, Executor, Worker, Runner etc. ) get kv cache.
In v1 code, I found it.
vllm/vllm/v1/core/scheduler.py
Lines 215 to 217 in 067fa22
But, v0, I failed to find it.
Does any one help me?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.