-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081
Comments
After use modify the /usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py with the vLLM start with the following error2024:09:14-11:02:35:( 241) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices |
The first issue could be solved by this modification:
We were unable to reproduce the second issue in our environment. It may be related to settings in the startup container script. |
The vllm docker image is
intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1
vLLM start command is
'model="/llm/models/Qwen2-72B-Instruct/"
served_model_name="Qwen2-72B-Instruct"
source /opt/intel/1ccl-wks/setvars.sh
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2
python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server
--served-model-name $served_model_name
--port 8000
--model $model
--trust-remote-code
--gpu-memory-utilization 0.85
--device xpu
--dtype float16
--enforce-eager
--load-in-low-bit fp8
--max-model-len 2048
--max-num-batched-tokens 2048
--max-num-seqs 24
-tp 4 -pp 2 --disable-log-requests'
The error information is
(WrapperWithLoadBit pid=35347) 2024:09:13-11:21:50:(35347) |CCL_ERROR| exchange_utils.cpp:202 sendmsg_fd: condition !check_msg_retval("sendmsg", send_bytes, iov, msg, sizeof(u.cntr_buf), sock, fd) failed
(WrapperWithLoadBit pid=35347) errno: Broken pipe
2024:09:13-11:21:50:(31157) |CCL_ERROR| exchange_utils.cpp:202 sendmsg_fd: condition !check_msg_retval("sendmsg", send_bytes, iov, msg, sizeof(u.cntr_buf), sock, fd) failed
errno: Broken pipe
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Error executing method init_device. This might cause deadlock in distributed execution.
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Traceback (most recent call last):
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] return executor(*args, **kwargs)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^^^^^
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] self.init_worker_distributed_environment()
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] get_pp_group().all_reduce(torch.zeros(1).xpu())
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] torch.distributed.all_reduce(input_, group=self.device_group)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] return func(*args, **kwargs)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] work.wait()
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] RuntimeError: oneCCL: exchange_utils.cpp:202 sendmsg_fd: EXCEPTION: errno: Broken pipe
ERROR 09-13 11:21:51 worker_base.py:386] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 09-13 11:21:51 worker_base.py:386] Traceback (most recent call last):
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
ERROR 09-13 11:21:51 worker_base.py:386] return executor(*args, **kwargs)
ERROR 09-13 11:21:51 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
ERROR 09-13 11:21:51 worker_base.py:386] self.init_worker_distributed_environment()
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
ERROR 09-13 11:21:51 worker_base.py:386] get_pp_group().all_reduce(torch.zeros(1).xpu())
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
ERROR 09-13 11:21:51 worker_base.py:386] torch.distributed.all_reduce(input_, group=self.device_group)
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
ERROR 09-13 11:21:51 worker_base.py:386] return func(*args, **kwargs)
ERROR 09-13 11:21:51 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
ERROR 09-13 11:21:51 worker_base.py:386] work.wait()
ERROR 09-13 11:21:51 worker_base.py:386] RuntimeError: oneCCL: exchange_utils.cpp:202 sendmsg_fd: EXCEPTION: errno: Broken pipe
Process Process-65:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 220, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, port, load_in_low_bit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 27, in init
self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 43, in from_engine_args
return super().from_engine_args(engine_args, start_engine_loop, usage_context, stat_loggers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 476, in from_engine_args
engine = cls(
^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 29, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 381, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 557, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 255, in init
self.model_executor = executor_class(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_xpu_executor.py", line 35, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 555, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/distributed_gpu_executor.py", line 25, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/xpu_executor.py", line 53, in init
self._init_executor()
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 61, in _init_executor
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 230, in _init_workers_ray
self._run_workers("init_device")
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 468, in run_workers
self.driver_worker.execute_method(method, *driver_args,
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 387, in execute_method
raise e
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
return executor(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
self.init_worker_distributed_environment()
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
get_pp_group().all_reduce(torch.zeros(1).xpu())
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
torch.distributed.all_reduce(input, group=self.device_group)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
work.wait()
The workaround is
The text was updated successfully, but these errors were encountered: