Skip to content

[Bug]: Model initialization failed #20118

Open
@zhichenggeng

Description

@zhichenggeng

Your current environment

The output of python collect_env.py
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.3 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0+cu126
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] (64-bit runtime)
Python platform              : Linux-5.15.0-1071-azure-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 11.8.89
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB

Nvidia driver version        : 550.90.07
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               96
On-line CPU(s) list:                  0-95
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 7V12 64-Core Processor
CPU family:                           23
Model:                                49
Thread(s) per core:                   1
Core(s) per socket:                   48
Socket(s):                            2
Stepping:                             0
BogoMIPS:                             4890.87
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            3 MiB (96 instances)
L1i cache:                            3 MiB (96 instances)
L2 cache:                             48 MiB (96 instances)
L3 cache:                             384 MiB (24 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-23
NUMA node1 CPU(s):                    24-47
NUMA node2 CPU(s):                    48-71
NUMA node3 CPU(s):                    72-95
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec rstack overflow:   Mitigation; safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pyzmq==26.4.0
[pip3] torch==2.7.0
[pip3] torchaudio==2.7.0
[pip3] torchvision==0.22.0
[pip3] transformers==4.51.3
[pip3] triton==3.3.0
[conda] No relevant packages

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
        GPU0    GPU1    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NODE    SYS     SYS     NODE    NODE    SYS     SYS     SYS     SYS     0-23    0               N/A
GPU1    NV12     X      SYS     SYS     SYS     SYS     SYS     NODE    NODE    SYS     SYS     72-95   3               N/A
NIC0    NODE    SYS      X      SYS     SYS     NODE    NODE    SYS     SYS     SYS     SYS
NIC1    SYS     SYS     SYS      X      NODE    SYS     SYS     SYS     SYS     SYS     SYS
NIC2    SYS     SYS     SYS     NODE     X      SYS     SYS     SYS     SYS     SYS     SYS
NIC3    NODE    SYS     NODE    SYS     SYS      X      NODE    SYS     SYS     SYS     SYS
NIC4    NODE    SYS     NODE    SYS     SYS     NODE     X      SYS     SYS     SYS     SYS
NIC5    SYS     NODE    SYS     SYS     SYS     SYS     SYS      X      NODE    SYS     SYS
NIC6    SYS     NODE    SYS     SYS     SYS     SYS     SYS     NODE     X      SYS     SYS
NIC7    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      NODE
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE     X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_an0
  NIC1: mlx5_ib0
  NIC2: mlx5_ib1
  NIC3: mlx5_ib2
  NIC4: mlx5_ib3
  NIC5: mlx5_ib4
  NIC6: mlx5_ib5
  NIC7: mlx5_ib6
  NIC8: mlx5_ib7

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=GPU-d5fa1edc-cdb0-2e2e-a38f-533238b5a62b,GPU-d879a2ab-52dd-73de-8dac-9545763a254e
NVIDIA_REQUIRE_CUDA=cuda>=11.8 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471
NCCL_IB_PCI_RELAXED_ORDERING=1
TORCH_DISTRIBUTED_INIT_STORE_STORAGE_ENDPOINT=MustSpecifyViaApplicationParameters
NCCL_VERSION=2.15.5-1
NCCL_SOCKET_IFNAME=eth0
NCCL_NET_GDR_LEVEL=5
NCCL_DEBUG_SUBSYS=INIT,GRAPH
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NCCL_DEBUG=INFO
NCCL_IB_HCA=
NVIDIA_PRODUCT_NAME=CUDA
TORCH_DISTRIBUTED_INIT_STORE_STORAGE_NAME=MustSpecifyViaApplicationParameters
CUDA_DEVICE_ORDER=PCI_BUS_ID
CUDA_VERSION=11.8.0
TORCH_DISTRIBUTED_INIT_STORE_MOUNT_PERMISSIONS=MustSpecifyViaApplicationParameters
TORCH_DISTRIBUTED_INIT_STORE_MOUNT_PATH=MustSpecifyViaApplicationParameters
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
TORCH_DISTRIBUTED_INIT_STORE_STORAGE_KIND=None
NCCL_TOPO_FILE=/opt/microsoft/ndv4-topo.xml
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY

🐛 Describe the bug

Engine core initialization failed.

Code example:

from vllm import LLM

llm = LLM(model="facebook/opt-125m")

Error:

DEBUG 06-26 08:13:27 [__init__.py:31] No plugins for group vllm.platform_plugins found.
DEBUG 06-26 08:13:27 [__init__.py:35] Checking if TPU platform is available.
DEBUG 06-26 08:13:27 [__init__.py:45] TPU platform is not available because: No module named 'libtpu'
DEBUG 06-26 08:13:27 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 06-26 08:13:27 [__init__.py:72] Confirmed CUDA platform is available.
DEBUG 06-26 08:13:27 [__init__.py:100] Checking if ROCm platform is available.
DEBUG 06-26 08:13:27 [__init__.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 06-26 08:13:27 [__init__.py:121] Checking if HPU platform is available.
DEBUG 06-26 08:13:27 [__init__.py:128] HPU platform is not available because habana_frameworks is not found.
DEBUG 06-26 08:13:27 [__init__.py:138] Checking if XPU platform is available.
DEBUG 06-26 08:13:27 [__init__.py:148] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 06-26 08:13:27 [__init__.py:155] Checking if CPU platform is available.
DEBUG 06-26 08:13:27 [__init__.py:177] Checking if Neuron platform is available.
DEBUG 06-26 08:13:27 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 06-26 08:13:27 [__init__.py:72] Confirmed CUDA platform is available.
INFO 06-26 08:13:27 [__init__.py:244] Automatically detected platform cuda.
DEBUG 06-26 08:13:30 [__init__.py:39] Available plugins for group vllm.general_plugins:
DEBUG 06-26 08:13:30 [__init__.py:41] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
DEBUG 06-26 08:13:30 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-26 08:13:49 [config.py:823] This model supports multiple tasks: {'embed', 'classify', 'reward', 'score', 'generate'}. Defaulting to 'generate'.
DEBUG 06-26 08:13:50 [arg_utils.py:1600] Setting max_num_batched_tokens to 8192 for LLM_CLASS usage context.
DEBUG 06-26 08:13:50 [arg_utils.py:1607] Setting max_num_seqs to 256 for LLM_CLASS usage context.
INFO 06-26 08:13:50 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-26 08:13:55 [core.py:455] Waiting for init message from front-end.
DEBUG 06-26 08:13:55 [utils.py:547] HELLO from local core engine process 0.
DEBUG 06-26 08:13:55 [core.py:463] Received init message: EngineHandshakeMetadata(addresses=EngineZmqAddresses(inputs=['ipc:///tmp/0b5a965a-683a-4d43-a75f-1d8a0e790794'], outputs=['ipc:///tmp/68f79e54-444c-4326-add5-b20daf4463ce'], coordinator_input=None, coordinator_output=None), parallel_config={'data_parallel_master_ip': '127.0.0.1', 'data_parallel_master_port': 0, 'data_parallel_size': 1})
INFO 06-26 08:13:55 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=facebook/opt-125m, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null}
DEBUG 06-26 08:13:55 [decorators.py:110] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
DEBUG 06-26 08:13:55 [decorators.py:110] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama_eagle3.LlamaModel'>: ['input_ids', 'positions', 'hidden_states']
WARNING 06-26 08:13:56 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f93add2e450>
DEBUG 06-26 08:13:56 [config.py:4677] enabled custom ops: Counter()
DEBUG 06-26 08:13:56 [config.py:4679] disabled custom ops: Counter()
DEBUG 06-26 08:13:57 [parallel_state.py:918] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://100.64.72.58:48903 backend=nccl
INFO 06-26 08:13:57 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 06-26 08:13:57 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
DEBUG 06-26 08:13:57 [decorators.py:110] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.opt.OPTModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
DEBUG 06-26 08:13:57 [config.py:4677] enabled custom ops: Counter()
DEBUG 06-26 08:13:57 [config.py:4679] disabled custom ops: Counter()
INFO 06-26 08:13:57 [gpu_model_runner.py:1595] Starting to load model facebook/opt-125m...
INFO 06-26 08:13:57 [gpu_model_runner.py:1600] Loading model from scratch...
INFO 06-26 08:13:57 [cuda.py:252] Using Flash Attention backend on V1 engine.
DEBUG 06-26 08:13:57 [backends.py:38] Using InductorAdaptor
DEBUG 06-26 08:13:58 [config.py:4677] enabled custom ops: Counter()
DEBUG 06-26 08:13:58 [config.py:4679] disabled custom ops: Counter()
INFO 06-26 08:13:59 [weight_utils.py:292] Using model weights format ['*.bin']
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.03it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.03it/s]

INFO 06-26 08:13:59 [default_loader.py:272] Loading weights took 0.17 seconds
INFO 06-26 08:14:00 [gpu_model_runner.py:1624] Model loading took 0.2389 GiB and 1.752694 seconds
DEBUG 06-26 08:14:00 [decorators.py:204] Start compiling function <code object forward at 0x7f952dda2b50, file "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/model_executor/models/opt.py", line 305>
DEBUG 06-26 08:14:01 [backends.py:412] Traced files (to be considered for compilation cache):
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/_dynamo/polyfills/__init__.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/nn/modules/activation.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/nn/modules/container.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/nn/modules/module.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/nn/modules/normalization.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/torch/nn/modules/sparse.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/attention/layer.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/distributed/communication_op.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/distributed/parallel_state.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/model_executor/layers/utils.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/model_executor/models/opt.py
DEBUG 06-26 08:14:01 [backends.py:412] /home/aiscuser/.local/lib/python3.11/site-packages/vllm/platforms/interface.py
INFO 06-26 08:14:02 [backends.py:462] Using cache directory: /home/aiscuser/.cache/vllm/torch_compile_cache/1e862139d7/rank_0_0 for vLLM's torch.compile
INFO 06-26 08:14:02 [backends.py:472] Dynamo bytecode transform time: 1.92 s
DEBUG 06-26 08:14:02 [fix_functionalization.py:104] De-functionalized 0 nodes, removed 0 nodes
DEBUG 06-26 08:14:02 [vllm_inductor_pass.py:56] FixFunctionalizationPass completed in 0.2 ms
Traceback (most recent call last):
  File "/tmp/output/1341d0e5-e512-4bf9-8cd3-57f8a2dcd890_dffb8f6d/example.py", line 3, in <module>
    llm = LLM(model="facebook/opt-125m")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 243, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 501, in from_engine_args
    return engine_cls.from_vllm_config(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config
    return cls(vllm_config=vllm_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 101, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 75, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 558, in __init__
    super().__init__(
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__
    self._init_engines_direct(vllm_config, local_only,
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct
    self._wait_for_engine_startup(handshake_socket, input_address,
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup
    wait_for_engine_startup(
  File "/home/aiscuser/.local/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

The error comes from

if len(events) > 1 or events[0][0] != handshake_socket:

Here is some output I got by adding some printing before the error:

events:  [(35, 4)]
handshake_socket:  <zmq.Socket(zmq.ROUTER) at 0x7f825d446890>

From the document of pyzmq: https://github.com/zeromq/pyzmq/blob/a4b9d0d421b7a70c88efb351ce1e2aead0ea0cd3/zmq/sugar/poll.py#L95-L100, it is returning integer fd as the first element in the tuple instead of 0MQ Socket. Have no idea what this means, but this is probably the reason?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions