Closed
Description
📚 The doc issue
In the file examples/offline_inference/cpu_offload_lmcache.py, line 43 states:
# Note that LMCache is not compatible with chunked prefill for now.
This is now outdated. Both vLLM and LMCache have merged PRs to fully support chunked prefill:
vLLM PR #14505, LMCache PR #392
The current note may mislead users into disabling a working feature.
Suggest a potential alternative/fix
Update the comment to either:
# Note: LMCache supports chunked prefill (see vLLM#14505, LMCache#392).
Or remove it entirely if compatibility is now considered stable/default.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.