Skip to content

Commit ae2dc17

Browse files
[deepseek_r1] reduce DMA transpose (#1404)
Co-authored-by: Chen Xinyu <xinyu1.chen@intel.com>
1 parent fed35b8 commit ae2dc17

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/attention/backends/mla/utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -445,9 +445,9 @@ def get_scales(layer: LinearBase) -> torch.Tensor:
445445
self.tp_size = get_tensor_model_parallel_world_size()
446446
else:
447447
# Convert from (L, N, V) to (N, L, V)
448-
self.W_UV = W_UV.transpose(0, 1)
448+
self.W_UV = W_UV.transpose(0, 1).contiguous()
449449
# Convert from (L, N, P) to (N, P, L)
450-
self.W_UK_T = W_UK.permute(1, 2, 0)
450+
self.W_UK_T = W_UK.permute(1, 2, 0).contiguous()
451451

452452
@abstractmethod
453453
def _forward_prefill(

0 commit comments

Comments
 (0)