You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Docs] Add Tutorials for Online Serving on Multi Machine (vllm-project#120)
Add Tutorials for Online Serving on Multi Machine
---------
Signed-off-by: SidaoY <1024863041@qq.com>
Co-authored-by: yx0716 <jinyx1007@foxmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
258
+
```
259
+
260
+
Start the vLLM server on head node:
261
+
262
+
```shell
263
+
export VLLM_HOST_IP={head_node_ip}
264
+
export HCCL_CONNECT_TIMEOUT=120
265
+
export ASCEND_PROCESS_LOG_PATH={plog_save_path}
266
+
export HCCL_IF_IP={head_node_ip}
267
+
268
+
if [ -d"{plog_save_path}" ];then
269
+
rm -rf {plog_save_path}
270
+
echo">>> remove {plog_save_path}"
271
+
fi
272
+
273
+
LOG_FILE="multinode_$(date +%Y%m%d_%H%M).log"
274
+
VLLM_TORCH_PROFILER_DIR=./vllm_profile
275
+
python -m vllm.entrypoints.openai.api_server \
276
+
--model="Deepseek/DeepSeek-V2-Lite-Chat" \
277
+
--trust-remote-code \
278
+
--enforce-eager \
279
+
--max-model-len {max_model_len} \
280
+
--distributed_executor_backend "ray" \
281
+
--tensor-parallel-size 16 \
282
+
--disable-log-requests \
283
+
--disable-log-stats \
284
+
--disable-frontend-multiprocessing \
285
+
--port {port_num} \
286
+
```
287
+
288
+
Once your server is started, you can query the model with input prompts:
289
+
290
+
```shell
291
+
curl -X POST http://127.0.0.1:{prot_num}/v1/completions \
292
+
-H "Content-Type: application/json" \
293
+
-d '{
294
+
"model": "Deepseek/DeepSeek-V2-Lite-Chat",
295
+
"prompt": "The future of AI is",
296
+
"max_tokens": 24
297
+
}'
298
+
```
299
+
300
+
If you query the server successfully, you can see the info shown below (client):
301
+
302
+
```
303
+
{"id":"cmpl-6dfb5a8d8be54d748f0783285dd52303","object":"text_completion","created":1739957835,"model":"/home/data/DeepSeek-V2-Lite-Chat/","choices":[{"index":0,"text":" heavily influenced by neuroscience and cognitiveGuionistes. The goalochondria is to combine the efforts of researchers, technologists,","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":6,"total_tokens":30,"completion_tokens":24,"prompt_tokens_details":null}}
304
+
```
305
+
306
+
Logs of the vllm server:
307
+
308
+
```
309
+
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
0 commit comments