Skip to content

Commit 290c4b4

Browse files
WoosukKwonyangw-dev
authored andcommitted
Enable prefix caching with full cuda graphs (vllm-project#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Yang Wang <elainewy@meta.com>
1 parent 51546b6 commit 290c4b4

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

vllm/config.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4495,7 +4495,6 @@ def __post_init__(self):
44954495
"full_cuda_graph is not supported with "
44964496
"cascade attention. Disabling cascade attention.")
44974497
self.model_config.disable_cascade_attn = True
4498-
self.cache_config.enable_prefix_caching = False
44994498

45004499
if (self.kv_events_config is not None
45014500
and self.kv_events_config.enable_kv_cache_events

0 commit comments

Comments
 (0)