Skip to content

Commit ff21cf6

Browse files
robertgshaw2-redhatRobert Shaw
authored andcommitted
[TPU] [Perf] Improve Memory Usage Estimation (vllm-project#15671)
Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
1 parent 29740c5 commit ff21cf6

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

vllm/v1/worker/tpu_worker.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,13 @@ def determine_available_memory(self) -> int:
161161
# intermediate activations.
162162
m = xm.get_memory_info(self.device)
163163
total_memory_size = m["bytes_limit"]
164-
profiled = m["peak_bytes_used"] # Weights + intermediate activations.
164+
current_mem = m["bytes_used"]
165+
# Ideally we would use profiled = m["peak_bytes_used"] to
166+
# get weights + activations. But there is memory used during
167+
# compilation / weight loading that impacts the peak and
168+
# there is no way to reset peak memory in XLA, So we
169+
# use the heuristic of 2% of weights.
170+
profiled = current_mem * 1.02
165171

166172
# Calculate the TPU KV cache size based on profiling.
167173
usable_memory_size = int(total_memory_size *

0 commit comments

Comments
 (0)