Skip to content

Commit 078ac25

Browse files
rkooo567DarkLight1337
authored andcommitted
[misc] Do not allow to use lora with chunked prefill. (vllm-project#5538)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
1 parent dd5f535 commit 078ac25

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,6 +1092,8 @@ def verify_with_scheduler_config(self, scheduler_config: SchedulerConfig):
10921092
"Due to limitations of the custom LoRA CUDA kernel, "
10931093
"max_num_batched_tokens must be <= 65528 when "
10941094
"LoRA is enabled.")
1095+
if scheduler_config.chunked_prefill_enabled:
1096+
raise ValueError("LoRA is not supported with chunked prefill yet.")
10951097

10961098

10971099
@dataclass

0 commit comments

Comments
 (0)