Description
Your current environment
Still failing on main as of commit 9609327
🐛 Describe the bug
FAILED distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26]
Logs
[2025-05-20T05:24:25Z] (VllmWorker rank=0 pid=10229) WARNING 05-19 22:24:25 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_L4.json
[2025-05-20T05:24:27Z] (VllmWorker rank=0 pid=10229) INFO 05-19 22:24:27 [monitor.py:33] torch.compile takes 10.50 s in total
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] WorkerProc hit an exception.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] output = func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] self.model_runner.profile_run()
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] raise e
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampler_output = self.sampler(logits=logits,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampled = self.sample(logits, sampling_metadata)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 126, in sample
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampling_metadata.temperature < _SAMPLING_EPS,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] output = func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] self.model_runner.profile_run()
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] raise e
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampler_output = self.sampler(logits=logits,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampled = self.sample(logits, sampling_metadata)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 126, in sample
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] sampling_metadata.temperature < _SAMPLING_EPS,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] EngineCore failed to start.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] Traceback (most recent call last):
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] self._initialize_kv_caches(vllm_config)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] output = self.collective_rpc("determine_available_memory")
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] result = get_response(w, dequeue_timeout)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] raise RuntimeError(
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ', please check the stack trace above for the root cause
[2025-05-20T05:24:29Z] ERROR 05-19 22:24:29 [multiproc_executor.py:135] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
[2025-05-20T05:24:29Z] Process EngineCore_0:
[2025-05-20T05:24:29Z] Traceback (most recent call last):
[2025-05-20T05:24:29Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-20T05:24:29Z] self.run()
[2025-05-20T05:24:29Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-20T05:24:29Z] self._target(*self._args, **self._kwargs)
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
[2025-05-20T05:24:29Z] raise e
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T05:24:29Z] engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T05:24:29Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T05:24:29Z] super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T05:24:29Z] self._initialize_kv_caches(vllm_config)
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T05:24:29Z] available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T05:24:29Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T05:24:29Z] output = self.collective_rpc("determine_available_memory")
[2025-05-20T05:24:29Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T05:24:29Z] result = get_response(w, dequeue_timeout)
[2025-05-20T05:24:29Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T05:24:29Z] raise RuntimeError(
[2025-05-20T05:24:29Z] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:29Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:29Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:29Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:29Z] ', please check the stack trace above for the root cause
[2025-05-20T05:24:31Z] Traceback (most recent call last):
[2025-05-20T05:24:31Z] File "/usr/local/bin/vllm", line 10, in <module>
[2025-05-20T05:24:31Z] sys.exit(main())
[2025-05-20T05:24:31Z] ^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 53, in main
[2025-05-20T05:24:31Z] args.dispatch_function(args)
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 40, in cmd
[2025-05-20T05:24:31Z] uvloop.run(run_server(args))
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[2025-05-20T05:24:31Z] return __asyncio.run(
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[2025-05-20T05:24:31Z] return runner.run(main)
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[2025-05-20T05:24:31Z] return self._loop.run_until_complete(task)
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[2025-05-20T05:24:31Z] return await main
[2025-05-20T05:24:31Z] ^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
[2025-05-20T05:24:31Z] async with build_async_engine_client(args) as engine_client:
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-05-20T05:24:31Z] return await anext(self.gen)
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 153, in build_async_engine_client
[2025-05-20T05:24:31Z] async with build_async_engine_client_from_engine_args(
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-05-20T05:24:31Z] return await anext(self.gen)
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 185, in build_async_engine_client_from_engine_args
[2025-05-20T05:24:31Z] async_llm = AsyncLLM.from_vllm_config(
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 152, in from_vllm_config
[2025-05-20T05:24:31Z] return cls(
[2025-05-20T05:24:31Z] ^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
[2025-05-20T05:24:31Z] self.engine_core = core_client_class(
[2025-05-20T05:24:31Z] ^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 734, in __init__
[2025-05-20T05:24:31Z] super().__init__(
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 418, in __init__
[2025-05-20T05:24:31Z] self._wait_for_engine_startup(output_address, parallel_config)
[2025-05-20T05:24:31Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 484, in _wait_for_engine_startup
[2025-05-20T05:24:31Z] raise RuntimeError("Engine core initialization failed. "
[2025-05-20T05:24:31Z] RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2025-05-20T05:24:32Z] Traceback (most recent call last):
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/utils.py", line 727, in wrapper
[2025-05-20T05:24:32Z] f(*args, **kwargs)
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 422, in test_tp_language_generation
[2025-05-20T05:24:32Z] _compare_tp(model_id,
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 388, in _compare_tp
[2025-05-20T05:24:32Z] compare_two_settings(model_id,
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/utils.py", line 465, in compare_two_settings
[2025-05-20T05:24:32Z] compare_all_settings(
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/utils.py", line 529, in compare_all_settings
[2025-05-20T05:24:32Z] with RemoteOpenAIServer(model,
[2025-05-20T05:24:32Z] ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/utils.py", line 133, in __init__
[2025-05-20T05:24:32Z] self._wait_for_server(url=self.url_for("health"),
[2025-05-20T05:24:32Z] File "/vllm-workspace/tests/utils.py", line 161, in _wait_for_server
[2025-05-20T05:24:32Z] raise RuntimeError("Server exited unexpectedly.") from None
[2025-05-20T05:24:32Z] RuntimeError: Server exited unexpectedly.
[2025-05-20T05:24:32Z] Fork a new process to run a test 8315
[2025-05-20T05:24:32Z] FAILED
[2025-05-20T05:24:32Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup27-mp-0-auto-test_options27] Fork a new process to run a test 10565
[2025-05-20T05:24:32Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup28-mp-1-auto-test_options28] Fork a new process to run a test 10566
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup29-ray-0-auto-test_options29] Fork a new process to run a test 10567
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup30-ray-1-auto-test_options30] Fork a new process to run a test 10568
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup31-mp-0-auto-test_options31] Fork a new process to run a test 10569
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup32-mp-1-auto-test_options32] Fork a new process to run a test 10570
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup33-ray-0-auto-test_options33] Fork a new process to run a test 10571
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup34-ray-1-auto-test_options34] Fork a new process to run a test 10572
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup35-mp-0-auto-test_options35] Fork a new process to run a test 10573
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup36-mp-1-auto-test_options36] Fork a new process to run a test 10574
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup37-ray-0-auto-test_options37] Fork a new process to run a test 10575
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup38-ray-1-auto-test_options38] Fork a new process to run a test 10576
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup39-mp-0-auto-test_options39] Fork a new process to run a test 10577
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup40-mp-1-auto-test_options40] Fork a new process to run a test 10578
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup41-ray-0-auto-test_options41] Fork a new process to run a test 10579
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup42-ray-1-auto-test_options42] Fork a new process to run a test 10580
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_language_embedding[intfloat/e5-mistral-7b-instruct-parallel_setup0-mp-0-auto-test_options0] Fork a new process to run a test 10581
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_language_embedding[BAAI/bge-multilingual-gemma2-parallel_setup1-mp-0-auto-test_options1] Fork a new process to run a test 10582
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[OpenGVLab/InternVL2-1B-parallel_setup0-mp-0-auto-test_options0] Fork a new process to run a test 10583
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:38Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[microsoft/Phi-3.5-vision-instruct-parallel_setup1-mp-0-auto-test_options1] Fork a new process to run a test 10584
[2025-05-20T05:24:38Z] PASSED
[2025-05-20T05:24:38Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[fixie-ai/ultravox-v0_5-llama-3_2-1b-parallel_setup2-mp-0-auto-test_options2] Fork a new process to run a test 10585
[2025-05-20T05:24:38Z] PASSED
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] =================================== FAILURES ===================================
[2025-05-20T05:24:38Z] _ test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26] _
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] args = ()
[2025-05-20T05:24:38Z] kwargs = {'distributed_backend': 'ray', 'model_id': 'microsoft/Phi-3.5-MoE-instruct', 'num_gpus_available': 2, 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=False, chunked_prefill=False), ...}
[2025-05-20T05:24:38Z] Skipped = <class 'Skipped'>, pid = 8315, pgid = 3893, _pid = 8315
[2025-05-20T05:24:38Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] @functools.wraps(f)
[2025-05-20T05:24:38Z] def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
[2025-05-20T05:24:38Z] # Make the process the leader of its own process group
[2025-05-20T05:24:38Z] # to avoid sending SIGTERM to the parent process
[2025-05-20T05:24:38Z] os.setpgrp()
[2025-05-20T05:24:38Z] from _pytest.outcomes import Skipped
[2025-05-20T05:24:38Z] pid = os.fork()
[2025-05-20T05:24:38Z] print(f"Fork a new process to run a test {pid}")
[2025-05-20T05:24:38Z] if pid == 0:
[2025-05-20T05:24:38Z] try:
[2025-05-20T05:24:38Z] f(*args, **kwargs)
[2025-05-20T05:24:38Z] except Skipped as e:
[2025-05-20T05:24:38Z] # convert Skipped to exit code 0
[2025-05-20T05:24:38Z] print(str(e))
[2025-05-20T05:24:38Z] os._exit(0)
[2025-05-20T05:24:38Z] except Exception:
[2025-05-20T05:24:38Z] import traceback
[2025-05-20T05:24:38Z] traceback.print_exc()
[2025-05-20T05:24:38Z] os._exit(1)
[2025-05-20T05:24:38Z] else:
[2025-05-20T05:24:38Z] os._exit(0)
[2025-05-20T05:24:38Z] else:
[2025-05-20T05:24:38Z] pgid = os.getpgid(pid)
[2025-05-20T05:24:38Z] _pid, _exitcode = os.waitpid(pid, 0)
[2025-05-20T05:24:38Z] # ignore SIGTERM signal itself
[2025-05-20T05:24:38Z] old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
[2025-05-20T05:24:38Z] # kill all child processes
[2025-05-20T05:24:38Z] os.killpg(pgid, signal.SIGTERM)
[2025-05-20T05:24:38Z] # restore the signal handler
[2025-05-20T05:24:38Z] signal.signal(signal.SIGTERM, old_signal_handler)
[2025-05-20T05:24:38Z] > assert _exitcode == 0, (f"function {f} failed when called with"
[2025-05-20T05:24:38Z] f" args {args} and kwargs {kwargs}")
[2025-05-20T05:24:38Z] E AssertionError: function <function test_tp_language_generation at 0x7f24d8990360> failed when called with args () and kwargs {'model_id': 'microsoft/Phi-3.5-MoE-instruct', 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=False, chunked_prefill=False), 'distributed_backend': 'ray', 'vllm_major_version': '1', 'task': 'auto', 'test_options': PPTestOptions(multi_node_only=True, load_format='dummy'), 'num_gpus_available': 2}
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] utils.py:747: AssertionError
[2025-05-20T05:24:38Z] =============================== warnings summary ===============================
[2025-05-20T05:24:38Z] ../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
[2025-05-20T05:24:38Z] /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
[2025-05-20T05:24:38Z] ref_error: type[Exception] = jsonschema.RefResolutionError,
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] tests/distributed/test_pipeline_parallel.py: 48 warnings
[2025-05-20T05:24:38Z] /vllm-workspace/tests/utils.py:723: DeprecationWarning: This process (pid=3893) is multi-threaded, use of fork() may lead to deadlocks in the child.
[2025-05-20T05:24:38Z] pid = os.fork()
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
[2025-05-20T05:24:38Z] =========================== short test summary info ============================
[2025-05-20T05:24:38Z] FAILED distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26]
Metadata
Metadata
Assignees
Type
Projects
Status
Done