Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Commit

Permalink
[Docs] Update documentation for gpu-memory-utilization option (vllm-p…
Browse files Browse the repository at this point in the history
  • Loading branch information
SuhongMoon authored Dec 17, 2023
1 parent 671af2b commit 3ec8c25
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/source/models/engine_args.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,11 @@ Below, you can find an explanation of every engine argument for vLLM:

CPU swap space size (GiB) per GPU.

.. option:: --gpu-memory-utilization <percentage>
.. option:: --gpu-memory-utilization <fraction>

The percentage of GPU memory to be used for the model executor.
The fraction of GPU memory to be used for the model executor, which can range from 0 to 1.
For example, a value of 0.5 would imply 50% GPU memory utilization.
If unspecified, will use the default value of 0.9.

.. option:: --max-num-batched-tokens <tokens>

Expand Down

0 comments on commit 3ec8c25

Please sign in to comment.