Skip to content

Commit

Permalink
[misc] Do not allow to use lora with chunked prefill. (vllm-project#5538
Browse files Browse the repository at this point in the history
)

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
  • Loading branch information
2 people authored and jimpang committed Jul 24, 2024
1 parent aaffbca commit c1d82e0
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1092,6 +1092,8 @@ def verify_with_scheduler_config(self, scheduler_config: SchedulerConfig):
"Due to limitations of the custom LoRA CUDA kernel, "
"max_num_batched_tokens must be <= 65528 when "
"LoRA is enabled.")
if scheduler_config.chunked_prefill_enabled:
raise ValueError("LoRA is not supported with chunked prefill yet.")


@dataclass
Expand Down

0 comments on commit c1d82e0

Please sign in to comment.