Skip to content

Commit

Permalink
Add comments on swap space (vllm-project#154)
Browse files Browse the repository at this point in the history
  • Loading branch information
WoosukKwon authored Jun 18, 2023
1 parent dcda03b commit 3f92038
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
3 changes: 2 additions & 1 deletion benchmarks/benchmark_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
On the server side, run one of the following commands:
(vLLM backend)
python -m vllm.entrypoints.api_server \
--disable-log-requests --model <your_model>
--model <your_model> --swap-space 16 \
--disable-log-requests
(TGI backend)
./launch_hf_server.sh <your_model>
Expand Down
7 changes: 6 additions & 1 deletion vllm/core/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,12 @@ def _swap_out(
seq_group: SequenceGroup,
blocks_to_swap_out: Dict[int, int],
) -> None:
assert self.block_manager.can_swap_out(seq_group)
if not self.block_manager.can_swap_out(seq_group):
# FIXME(woosuk): Abort the sequence group instead of aborting the
# entire engine.
raise RuntimeError(
"Aborted due to the lack of CPU swap space. Please increase "
"the swap space to avoid this error.")
mapping = self.block_manager.swap_out(seq_group)
blocks_to_swap_out.update(mapping)
for seq in seq_group.get_seqs(status=SequenceStatus.RUNNING):
Expand Down

0 comments on commit 3f92038

Please sign in to comment.