Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
vllm容器:0.7.2
启动脚本:
bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--head
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/wangjianqiang/deepseek/DeepSeek-R1/:/root/deepseek_r1/
bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--worker
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/deepseek_r1/:/root/deepseek_r1/
启动命令:
root@admin:/deepseek_r1/DeepSeek-R1# VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/deepseek_r1/DeepSeek-R1# export VLLM_HOST_IP
root@admin:
root@admin:~/deepseek_r1/DeepSeek-R1# NCCL_DEBUG=TRACE vllm serve /root/deepseek_r1/DeepSeek-R1 --tensor-parallel-size 16 --trust-remote-code
出现了如下的错误:
ERROR 02-09 02:31:11 engine.py:389] Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
ERROR 02-09 02:31:11 engine.py:389] Traceback (most recent call last):
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
ERROR 02-09 02:31:11 engine.py:389] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
ERROR 02-09 02:31:11 engine.py:389] return cls(ipc_path=ipc_path,
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
ERROR 02-09 02:31:11 engine.py:389] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 02-09 02:31:11 engine.py:389] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init
ERROR 02-09 02:31:11 engine.py:389] super().init(*args, **kwargs)
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init
ERROR 02-09 02:31:11 engine.py:389] self._init_executor()
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
ERROR 02-09 02:31:11 engine.py:389] self._init_workers_ray(placement_group)
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray
ERROR 02-09 02:31:11 engine.py:389] raise ValueError(
ERROR 02-09 02:31:11 engine.py:389] ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray
raise ValueError(
ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
*** SIGTERM received at time=1739097071 on cpu 95 ***
PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep
@ 0x7fa5c95d4520 (unknown) (unknown)
@ ... and at least 3 more frames
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: *** SIGTERM received at time=1739097071 on cpu 95 ***
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ 0x7fa5c95d4520 (unknown) (unknown)
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ ... and at least 3 more frames
Exception ignored in atexit callback: <function shutdown at 0x7fa44c55bd80>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1910, in shutdown
time.sleep(0.5)
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1499, in sigterm_handler
sys.exit(signum)
SystemExit: 15
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 204, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 44, in serve
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
🐛 Describe the bug
1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Type
Projects
Status