Skip to content

[Misc]: Enable memory usage logging for vLLM GPU worker #17122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
34 changes: 19 additions & 15 deletions vllm/v1/worker/gpu_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,21 +228,25 @@ def determine_available_memory(self) -> int:
non_torch_alloc_bytes = max(0, fwd_alloc_bytes - torch_allocated_bytes)
# Total forward allocation (peak) is peak torch + non-torch
peak_memory = peak_torch_memory + non_torch_alloc_bytes

available_kv_cache_memory = (
total_gpu_memory * self.cache_config.gpu_memory_utilization -
peak_memory)

GiB = lambda b: b / GiB_bytes
logger.debug(
"Initial free memory: %.2f GiB, free memory: %.2f GiB, "
"total GPU memory: %.2f GiB", GiB(self.init_gpu_memory),
GiB(free_gpu_memory), GiB(total_gpu_memory))
logger.debug(
"Peak torch memory: %.2f GiB, non-torch forward-pass memory: "
"%.2f GiB, available KVCache memory: %.2f GiB",
GiB(peak_torch_memory), GiB(non_torch_alloc_bytes),
GiB(available_kv_cache_memory))
memory_for_current_instance = total_gpu_memory * \
self.cache_config.gpu_memory_utilization
available_kv_cache_memory = memory_for_current_instance - peak_memory

msg = (
"The current vLLM instance can use "
f"Initial free memory: {self.init_gpu_memory / GiB_bytes:.2f} GiB"
" x gpu_memory_utilization "
f"({self.cache_config.gpu_memory_utilization:.2f})"
f" = {memory_for_current_instance / GiB_bytes:.2f} GiB\n"
"Non torch memory takes "
f"{non_torch_alloc_bytes / GiB_bytes:.2f} GiB"
" and pytorch activation peak memory takes "
f"{peak_memory / GiB_bytes:.2f} GiB"
"and available KVCache memory is "
f"{available_kv_cache_memory / GiB_bytes:.2f} GiB\n"
"The rest of the memory reserved for KV Cache is "
f"{available_kv_cache_memory / GiB_bytes:.2f} GiB.")
logger.info(msg)

return int(available_kv_cache_memory)

Expand Down