Skip to content

Commit 21fa74b

Browse files
ZhongYingMatrixDamonFool
authored andcommitted
Fix mla prefill context performance (vllm-project#13897)
Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com>
1 parent 93fa3f2 commit 21fa74b

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

vllm/attention/backends/mla/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1308,7 +1308,7 @@ def _compute_prefill_context(
13081308
)
13091309

13101310
kv_c_normed = workspace[:toks]\
1311-
[..., :self.kv_lora_rank].unsqueeze(1)
1311+
[..., :self.kv_lora_rank]
13121312
k_pe = workspace[:toks]\
13131313
[..., self.kv_lora_rank:].unsqueeze(1)
13141314

vllm/v1/attention/backends/mla/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -874,7 +874,7 @@ def _compute_prefill_context(
874874
)
875875

876876
kv_c_normed = workspace[:toks]\
877-
[..., :self.kv_lora_rank].unsqueeze(1)
877+
[..., :self.kv_lora_rank]
878878
k_pe = workspace[:toks]\
879879
[..., self.kv_lora_rank:].unsqueeze(1)
880880

0 commit comments

Comments
 (0)