Skip to content

Commit

Permalink
Fix triton compilation issue (vllm-project#3984)
Browse files Browse the repository at this point in the history
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
  • Loading branch information
Bellk17 and WoosukKwon authored Apr 12, 2024
1 parent fbb9d9e commit d04973a
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion vllm/attention/ops/triton_flash_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,11 @@ def attn_fwd(
return

is_mqa = hq != hk
off_h_k = off_h_q % hk if is_mqa else off_h_q
if is_mqa: # noqa: SIM108
off_h_k = off_h_q % hk
else:
off_h_k = off_h_q

n_extra_tokens = 0
if seqlen_k < BLOCK_N:
n_extra_tokens = BLOCK_N - seqlen_k
Expand Down

0 comments on commit d04973a

Please sign in to comment.