Skip to content

Commit c30aac4

Browse files
lsy323minpeter
authored andcommitted
[Bugfix][TPU] Use np array when updating cache slot_mapping (vllm-project#17971)
Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: minpeter <kali2005611@gmail.com>
1 parent eab0c32 commit c30aac4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/worker/tpu_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,7 @@ def _prepare_inputs(self, scheduler_output: "SchedulerOutput"):
531531
np.add(block_numbers * self.block_size,
532532
block_offsets,
533533
out=self.input_batch.block_table.
534-
slot_mapping_cpu[:total_num_scheduled_tokens])
534+
slot_mapping_np[:total_num_scheduled_tokens])
535535

536536
# Prepare the attention metadata.
537537
self.query_start_loc_np[0] = 0

0 commit comments

Comments
 (0)