Skip to content

Commit 120b2fd

Browse files
caoshiyijimpang
authored andcommitted
Prefix Caching- fix t4 triton error (vllm-project#2517)
1 parent b475897 commit 120b2fd

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/model_executor/layers/triton_kernel/prefix_prefill.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -618,7 +618,9 @@ def context_attention_fwd(q,
618618
b_ctx_len,
619619
max_input_len,
620620
alibi_slopes=None):
621-
BLOCK = 128
621+
622+
cap = torch.cuda.get_device_capability()
623+
BLOCK = 128 if cap[0] >= 8 else 64
622624
# shape constraints
623625
Lq, Lk, Lv = q.shape[-1], k.shape[-1], v.shape[-1]
624626
assert Lq == Lk and Lk == Lv

0 commit comments

Comments
 (0)