Skip to content

Commit d695d12

Browse files
iAmir97iAmir97gemini-code-assist[bot]
authored andcommitted
[Bugfix] Add reset prefix cache for online serving (vllm-project#22726)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
1 parent 01745ba commit d695d12

File tree

2 files changed

+2
-0
lines changed

2 files changed

+2
-0
lines changed

vllm/engine/async_llm_engine.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,6 +1092,7 @@ async def reset_prefix_cache(self,
10921092
self.engine.reset_prefix_cache(device)
10931093

10941094
async def sleep(self, level: int = 1) -> None:
1095+
await self.reset_prefix_cache()
10951096
self.engine.sleep(level)
10961097

10971098
async def wake_up(self, tags: Optional[list[str]] = None) -> None:

vllm/v1/engine/async_llm.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -576,6 +576,7 @@ async def reset_prefix_cache(self,
576576
await self.engine_core.reset_prefix_cache_async()
577577

578578
async def sleep(self, level: int = 1) -> None:
579+
await self.reset_prefix_cache()
579580
await self.engine_core.sleep_async(level)
580581

581582
async def wake_up(self, tags: Optional[list[str]] = None) -> None:

0 commit comments

Comments
 (0)