From 0b7d0503241724c837588c5b1d3cfa6a926906d8 Mon Sep 17 00:00:00 2001 From: youkaichao Date: Mon, 24 Jun 2024 00:37:42 -0700 Subject: [PATCH] [doc][faq] add warning to download models for every nodes (#5783) Signed-off-by: Alvant --- docs/source/serving/distributed_serving.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/source/serving/distributed_serving.rst b/docs/source/serving/distributed_serving.rst index b0c45dbf70268..2a7937a9189c1 100644 --- a/docs/source/serving/distributed_serving.rst +++ b/docs/source/serving/distributed_serving.rst @@ -35,4 +35,7 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime -After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines. \ No newline at end of file +After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines. + +.. warning:: + Please make sure you downloaded the model to all the nodes, or the model is downloaded to some distributed file system that is accessible by all nodes.