Skip to content

[Bug][Failing Test] 2-node-tests-4-gpus-in-total - distributed/test_pipeline_parallel.py::test_tp_* #18417

Closed
@markmc

Description

@markmc

Your current environment

Still failing on main as of commit 9609327

🐛 Describe the bug

Failing test: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main&commit=Search&period=1day&query=test_tp_language_generation

FAILED distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26]
Logs
[2025-05-20T05:24:25Z] (VllmWorker rank=0 pid=10229) WARNING 05-19 22:24:25 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_L4.json
[2025-05-20T05:24:27Z] (VllmWorker rank=0 pid=10229) INFO 05-19 22:24:27 [monitor.py:33] torch.compile takes 10.50 s in total
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] WorkerProc hit an exception.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     output = func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     self.model_runner.profile_run()
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     raise e
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampler_output = self.sampler(logits=logits,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampled = self.sample(logits, sampling_metadata)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 126, in sample
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampling_metadata.temperature < _SAMPLING_EPS,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     output = func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     self.model_runner.profile_run()
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     raise e
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampler_output = self.sampler(logits=logits,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampled = self.sample(logits, sampling_metadata)
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 126, in sample
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     sampling_metadata.temperature < _SAMPLING_EPS,
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] (VllmWorker rank=0 pid=10229) ERROR 05-19 22:24:28 [multiproc_executor.py:522]
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] EngineCore failed to start.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] Traceback (most recent call last):
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     self._initialize_kv_caches(vllm_config)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     output = self.collective_rpc("determine_available_memory")
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     result = get_response(w, dequeue_timeout)
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489]     raise RuntimeError(
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:28Z] ERROR 05-19 22:24:28 [core.py:489] ', please check the stack trace above for the root cause
[2025-05-20T05:24:29Z] ERROR 05-19 22:24:29 [multiproc_executor.py:135] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
[2025-05-20T05:24:29Z] Process EngineCore_0:
[2025-05-20T05:24:29Z] Traceback (most recent call last):
[2025-05-20T05:24:29Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-20T05:24:29Z]     self.run()
[2025-05-20T05:24:29Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-20T05:24:29Z]     self._target(*self._args, **self._kwargs)
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
[2025-05-20T05:24:29Z]     raise e
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T05:24:29Z]     engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T05:24:29Z]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T05:24:29Z]     super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T05:24:29Z]     self._initialize_kv_caches(vllm_config)
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T05:24:29Z]     available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T05:24:29Z]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T05:24:29Z]     output = self.collective_rpc("determine_available_memory")
[2025-05-20T05:24:29Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T05:24:29Z]     result = get_response(w, dequeue_timeout)
[2025-05-20T05:24:29Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:29Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T05:24:29Z]     raise RuntimeError(
[2025-05-20T05:24:29Z] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T05:24:29Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T05:24:29Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T05:24:29Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T05:24:29Z] ', please check the stack trace above for the root cause
[2025-05-20T05:24:31Z] Traceback (most recent call last):
[2025-05-20T05:24:31Z]   File "/usr/local/bin/vllm", line 10, in <module>
[2025-05-20T05:24:31Z]     sys.exit(main())
[2025-05-20T05:24:31Z]              ^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 53, in main
[2025-05-20T05:24:31Z]     args.dispatch_function(args)
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 40, in cmd
[2025-05-20T05:24:31Z]     uvloop.run(run_server(args))
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[2025-05-20T05:24:31Z]     return __asyncio.run(
[2025-05-20T05:24:31Z]            ^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[2025-05-20T05:24:31Z]     return runner.run(main)
[2025-05-20T05:24:31Z]            ^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[2025-05-20T05:24:31Z]     return self._loop.run_until_complete(task)
[2025-05-20T05:24:31Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[2025-05-20T05:24:31Z]     return await main
[2025-05-20T05:24:31Z]            ^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
[2025-05-20T05:24:31Z]     async with build_async_engine_client(args) as engine_client:
[2025-05-20T05:24:31Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-05-20T05:24:31Z]     return await anext(self.gen)
[2025-05-20T05:24:31Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 153, in build_async_engine_client
[2025-05-20T05:24:31Z]     async with build_async_engine_client_from_engine_args(
[2025-05-20T05:24:31Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-05-20T05:24:31Z]     return await anext(self.gen)
[2025-05-20T05:24:31Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 185, in build_async_engine_client_from_engine_args
[2025-05-20T05:24:31Z]     async_llm = AsyncLLM.from_vllm_config(
[2025-05-20T05:24:31Z]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 152, in from_vllm_config
[2025-05-20T05:24:31Z]     return cls(
[2025-05-20T05:24:31Z]            ^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
[2025-05-20T05:24:31Z]     self.engine_core = core_client_class(
[2025-05-20T05:24:31Z]                        ^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 734, in __init__
[2025-05-20T05:24:31Z]     super().__init__(
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 418, in __init__
[2025-05-20T05:24:31Z]     self._wait_for_engine_startup(output_address, parallel_config)
[2025-05-20T05:24:31Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 484, in _wait_for_engine_startup
[2025-05-20T05:24:31Z]     raise RuntimeError("Engine core initialization failed. "
[2025-05-20T05:24:31Z] RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2025-05-20T05:24:32Z] Traceback (most recent call last):
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/utils.py", line 727, in wrapper
[2025-05-20T05:24:32Z]     f(*args, **kwargs)
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 422, in test_tp_language_generation
[2025-05-20T05:24:32Z]     _compare_tp(model_id,
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 388, in _compare_tp
[2025-05-20T05:24:32Z]     compare_two_settings(model_id,
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/utils.py", line 465, in compare_two_settings
[2025-05-20T05:24:32Z]     compare_all_settings(
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/utils.py", line 529, in compare_all_settings
[2025-05-20T05:24:32Z]     with RemoteOpenAIServer(model,
[2025-05-20T05:24:32Z]          ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/utils.py", line 133, in __init__
[2025-05-20T05:24:32Z]     self._wait_for_server(url=self.url_for("health"),
[2025-05-20T05:24:32Z]   File "/vllm-workspace/tests/utils.py", line 161, in _wait_for_server
[2025-05-20T05:24:32Z]     raise RuntimeError("Server exited unexpectedly.") from None
[2025-05-20T05:24:32Z] RuntimeError: Server exited unexpectedly.
[2025-05-20T05:24:32Z] Fork a new process to run a test 8315
[2025-05-20T05:24:32Z] FAILED
[2025-05-20T05:24:32Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup27-mp-0-auto-test_options27] Fork a new process to run a test 10565
[2025-05-20T05:24:32Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup28-mp-1-auto-test_options28] Fork a new process to run a test 10566
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup29-ray-0-auto-test_options29] Fork a new process to run a test 10567
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup30-ray-1-auto-test_options30] Fork a new process to run a test 10568
[2025-05-20T05:24:33Z] PASSED
[2025-05-20T05:24:33Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup31-mp-0-auto-test_options31] Fork a new process to run a test 10569
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup32-mp-1-auto-test_options32] Fork a new process to run a test 10570
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup33-ray-0-auto-test_options33] Fork a new process to run a test 10571
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup34-ray-1-auto-test_options34] Fork a new process to run a test 10572
[2025-05-20T05:24:34Z] PASSED
[2025-05-20T05:24:34Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup35-mp-0-auto-test_options35] Fork a new process to run a test 10573
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup36-mp-1-auto-test_options36] Fork a new process to run a test 10574
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup37-ray-0-auto-test_options37] Fork a new process to run a test 10575
[2025-05-20T05:24:35Z] PASSED
[2025-05-20T05:24:35Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup38-ray-1-auto-test_options38] Fork a new process to run a test 10576
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup39-mp-0-auto-test_options39] Fork a new process to run a test 10577
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup40-mp-1-auto-test_options40] Fork a new process to run a test 10578
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup41-ray-0-auto-test_options41] Fork a new process to run a test 10579
[2025-05-20T05:24:36Z] PASSED
[2025-05-20T05:24:36Z] distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup42-ray-1-auto-test_options42] Fork a new process to run a test 10580
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_language_embedding[intfloat/e5-mistral-7b-instruct-parallel_setup0-mp-0-auto-test_options0] Fork a new process to run a test 10581
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_language_embedding[BAAI/bge-multilingual-gemma2-parallel_setup1-mp-0-auto-test_options1] Fork a new process to run a test 10582
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:37Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[OpenGVLab/InternVL2-1B-parallel_setup0-mp-0-auto-test_options0] Fork a new process to run a test 10583
[2025-05-20T05:24:37Z] PASSED
[2025-05-20T05:24:38Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[microsoft/Phi-3.5-vision-instruct-parallel_setup1-mp-0-auto-test_options1] Fork a new process to run a test 10584
[2025-05-20T05:24:38Z] PASSED
[2025-05-20T05:24:38Z] distributed/test_pipeline_parallel.py::test_tp_multimodal_generation[fixie-ai/ultravox-v0_5-llama-3_2-1b-parallel_setup2-mp-0-auto-test_options2] Fork a new process to run a test 10585
[2025-05-20T05:24:38Z] PASSED
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] =================================== FAILURES ===================================
[2025-05-20T05:24:38Z] _ test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26] _
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] args = ()
[2025-05-20T05:24:38Z] kwargs = {'distributed_backend': 'ray', 'model_id': 'microsoft/Phi-3.5-MoE-instruct', 'num_gpus_available': 2, 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=False, chunked_prefill=False), ...}
[2025-05-20T05:24:38Z] Skipped = <class 'Skipped'>, pid = 8315, pgid = 3893, _pid = 8315
[2025-05-20T05:24:38Z] _exitcode = 256, old_signal_handler = <Handlers.SIG_DFL: 0>
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z]     @functools.wraps(f)
[2025-05-20T05:24:38Z]     def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> None:
[2025-05-20T05:24:38Z]         # Make the process the leader of its own process group
[2025-05-20T05:24:38Z]         # to avoid sending SIGTERM to the parent process
[2025-05-20T05:24:38Z]         os.setpgrp()
[2025-05-20T05:24:38Z]         from _pytest.outcomes import Skipped
[2025-05-20T05:24:38Z]         pid = os.fork()
[2025-05-20T05:24:38Z]         print(f"Fork a new process to run a test {pid}")
[2025-05-20T05:24:38Z]         if pid == 0:
[2025-05-20T05:24:38Z]             try:
[2025-05-20T05:24:38Z]                 f(*args, **kwargs)
[2025-05-20T05:24:38Z]             except Skipped as e:
[2025-05-20T05:24:38Z]                 # convert Skipped to exit code 0
[2025-05-20T05:24:38Z]                 print(str(e))
[2025-05-20T05:24:38Z]                 os._exit(0)
[2025-05-20T05:24:38Z]             except Exception:
[2025-05-20T05:24:38Z]                 import traceback
[2025-05-20T05:24:38Z]                 traceback.print_exc()
[2025-05-20T05:24:38Z]                 os._exit(1)
[2025-05-20T05:24:38Z]             else:
[2025-05-20T05:24:38Z]                 os._exit(0)
[2025-05-20T05:24:38Z]         else:
[2025-05-20T05:24:38Z]             pgid = os.getpgid(pid)
[2025-05-20T05:24:38Z]             _pid, _exitcode = os.waitpid(pid, 0)
[2025-05-20T05:24:38Z]             # ignore SIGTERM signal itself
[2025-05-20T05:24:38Z]             old_signal_handler = signal.signal(signal.SIGTERM, signal.SIG_IGN)
[2025-05-20T05:24:38Z]             # kill all child processes
[2025-05-20T05:24:38Z]             os.killpg(pgid, signal.SIGTERM)
[2025-05-20T05:24:38Z]             # restore the signal handler
[2025-05-20T05:24:38Z]             signal.signal(signal.SIGTERM, old_signal_handler)
[2025-05-20T05:24:38Z] >           assert _exitcode == 0, (f"function {f} failed when called with"
[2025-05-20T05:24:38Z]                                     f" args {args} and kwargs {kwargs}")
[2025-05-20T05:24:38Z] E           AssertionError: function <function test_tp_language_generation at 0x7f24d8990360> failed when called with args () and kwargs {'model_id': 'microsoft/Phi-3.5-MoE-instruct', 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=False, chunked_prefill=False), 'distributed_backend': 'ray', 'vllm_major_version': '1', 'task': 'auto', 'test_options': PPTestOptions(multi_node_only=True, load_format='dummy'), 'num_gpus_available': 2}
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] utils.py:747: AssertionError
[2025-05-20T05:24:38Z] =============================== warnings summary ===============================
[2025-05-20T05:24:38Z] ../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
[2025-05-20T05:24:38Z]   /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
[2025-05-20T05:24:38Z]     ref_error: type[Exception] = jsonschema.RefResolutionError,
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] tests/distributed/test_pipeline_parallel.py: 48 warnings
[2025-05-20T05:24:38Z]   /vllm-workspace/tests/utils.py:723: DeprecationWarning: This process (pid=3893) is multi-threaded, use of fork() may lead to deadlocks in the child.
[2025-05-20T05:24:38Z]     pid = os.fork()
[2025-05-20T05:24:38Z]
[2025-05-20T05:24:38Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
[2025-05-20T05:24:38Z] =========================== short test summary info ============================
[2025-05-20T05:24:38Z] FAILED distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup26-ray-1-auto-test_options26]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci-failureIssue about an unexpected test failure in CI

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions