-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Closed
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Description
Your current environment
N/A
🐛 Describe the bug
models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case91]
is failing on main. It is another illegal memory access error.
https://buildkite.com/vllm/ci/builds/20503/steps?jid=0196f626-d4d6-4af6-b10f-da8c3145ddfc
Stack:
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:68] Dumping input data
--- Logging error ---
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model
[2025-05-22T05:33:18Z] return self.model_executor.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
[2025-05-22T05:33:18Z] output = self.collective_rpc("execute_model",
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-22T05:33:18Z] answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
[2025-05-22T05:33:18Z] output = self.model_runner.execute_model(scheduler_output,
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model
[2025-05-22T05:33:18Z] self._prepare_inputs(scheduler_output))
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs
[2025-05-22T05:33:18Z] self.input_batch.block_table.commit(num_reqs)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit
[2025-05-22T05:33:18Z] self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z]
[2025-05-22T05:33:18Z]
[2025-05-22T05:33:18Z] During handling of the above exception, another exception occurred:
[2025-05-22T05:33:18Z]
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/logging/__init__.py", line 1160, in emit
[2025-05-22T05:33:18Z] msg = self.format(record)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/logging/__init__.py", line 999, in format
[2025-05-22T05:33:18Z] return fmt.format(record)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/formatter.py", line 13, in format
[2025-05-22T05:33:18Z] msg = logging.Formatter.format(self, record)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/logging/__init__.py", line 703, in format
[2025-05-22T05:33:18Z] record.message = record.getMessage()
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/logging/__init__.py", line 392, in getMessage
[2025-05-22T05:33:18Z] msg = msg % self.args
[2025-05-22T05:33:18Z] ~~~~^~~~~~~~~~~
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 4488, in __str__
[2025-05-22T05:33:18Z] f"compilation_config={self.compilation_config!r}")
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3872, in __repr__
[2025-05-22T05:33:18Z] for k, v in asdict(self).items():
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/dataclasses.py", line 1329, in asdict
[2025-05-22T05:33:18Z] return _asdict_inner(obj, dict_factory)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/dataclasses.py", line 1339, in _asdict_inner
[2025-05-22T05:33:18Z] f.name: _asdict_inner(getattr(obj, f.name), dict)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/dataclasses.py", line 1382, in _asdict_inner
[2025-05-22T05:33:18Z] return type(obj)((_asdict_inner(k, dict_factory),
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/dataclasses.py", line 1383, in <genexpr>
[2025-05-22T05:33:18Z] _asdict_inner(v, dict_factory))
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/dataclasses.py", line 1386, in _asdict_inner
[2025-05-22T05:33:18Z] return copy.deepcopy(obj)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 162, in deepcopy
[2025-05-22T05:33:18Z] y = _reconstruct(x, memo, *rv)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 259, in _reconstruct
[2025-05-22T05:33:18Z] state = deepcopy(state, memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 136, in deepcopy
[2025-05-22T05:33:18Z] y = copier(x, memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 221, in _deepcopy_dict
[2025-05-22T05:33:18Z] y[deepcopy(key, memo)] = deepcopy(value, memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
[2025-05-22T05:33:18Z] y = copier(memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/_tensor.py", line 172, in __deepcopy__
[2025-05-22T05:33:18Z] new_storage = self._typed_storage()._deepcopy(memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 1134, in _deepcopy
[2025-05-22T05:33:18Z] return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
[2025-05-22T05:33:18Z] y = copier(memo)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 239, in __deepcopy__
[2025-05-22T05:33:18Z] new_storage = self.clone()
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 253, in clone
[2025-05-22T05:33:18Z] return type(self)(self.nbytes(), device=self.device).copy_(self)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z]
[2025-05-22T05:33:18Z] Call stack:
[2025-05-22T05:33:18Z] File "<string>", line 1, in <module>
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
[2025-05-22T05:33:18Z] exitcode = _main(fd, parent_sentinel)
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135, in _main
[2025-05-22T05:33:18Z] return self._bootstrap(parent_sentinel)
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-22T05:33:18Z] self.run()
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-22T05:33:18Z] self._target(*self._args, **self._kwargs)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core
[2025-05-22T05:33:18Z] engine_core.run_busy_loop()
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop
[2025-05-22T05:33:18Z] self._process_engine_step()
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step
[2025-05-22T05:33:18Z] outputs = self.step_fn()
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step
[2025-05-22T05:33:18Z] model_output = self.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 210, in execute_model
[2025-05-22T05:33:18Z] dump_engine_exception(self.vllm_config, scheduler_output,
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 62, in dump_engine_exception
[2025-05-22T05:33:18Z] _dump_engine_exception(config, scheduler_output, scheduler_stats)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 70, in _dump_engine_exception
[2025-05-22T05:33:18Z] logger.error(
[2025-05-22T05:33:18Z] Unable to print the message and arguments - possible formatting error.
[2025-05-22T05:33:18Z] Use the traceback above to help find the error.
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:78] Dumping scheduler output for model execution:
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=0,prompt_token_ids_len=281,mm_inputs=[{'pixel_values': tensor([[[[-0.6314, -0.6314, -0.6314, ..., 0.5922, 0.5451, 0.5373],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.6314, -0.6314, -0.6314, ..., 0.5922, 0.5451, 0.5373],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.6314, -0.6314, -0.6314, ..., 0.5529, 0.5059, 0.4980],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] ...,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3176, 0.3176, 0.3020, ..., 0.5294, 0.5373, 0.5373],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3176, 0.3176, 0.3020, ..., 0.5294, 0.5373, 0.5373],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3176, 0.3176, 0.3020, ..., 0.5294, 0.5373, 0.5373]],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [[-0.8980, -0.8980, -0.8980, ..., 0.5216, 0.4431, 0.4353],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.8980, -0.8980, -0.8980, ..., 0.5216, 0.4431, 0.4353],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.8980, -0.8980, -0.8980, ..., 0.4588, 0.3882, 0.3804],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] ...,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3647, 0.3647, 0.3490, ..., 0.5451, 0.5529, 0.5529],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3647, 0.3647, 0.3490, ..., 0.5451, 0.5529, 0.5529],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.3647, 0.3647, 0.3490, ..., 0.5451, 0.5529, 0.5529]],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [[-0.9686, -0.9686, -0.9686, ..., 0.4510, 0.3490, 0.3333],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.9686, -0.9686, -0.9686, ..., 0.4510, 0.3490, 0.3333],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [-0.9686, -0.9686, -0.9686, ..., 0.3725, 0.2784, 0.2627],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] ...,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.2863, 0.2863, 0.2706, ..., 0.4431, 0.4510, 0.4510],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.2863, 0.2863, 0.2706, ..., 0.4431, 0.4510, 0.4510],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] [ 0.2863, 0.2863, 0.2706, ..., 0.4431, 0.4510, 0.4510]]]]), 'num_crops': tensor([0])}],mm_hashes=['f60a83610bcc902af2e0be4780926de06a310afae0d11f9d2feee331134ff15a'],mm_positions=[PlaceholderRange(offset=4, length=260, is_embed=tensor([False, False, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, True, True,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] True, True, True, True, True, True, True, True, False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=5, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18],num_computed_tokens=0,lora_request=None)],scheduled_cached_reqs=[],num_scheduled_tokens={0: 281},total_num_scheduled_tokens=281,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={0: [0]},num_common_prefix_blocks=18,finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] EngineCore encountered a fatal error.
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] Traceback (most recent call last):
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] engine_core.run_busy_loop()
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] self._process_engine_step()
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] outputs = self.step_fn()
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] model_output = self.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 213, in execute_model
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] raise err
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] return self.model_executor.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] output = self.collective_rpc("execute_model",
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] output = self.model_runner.execute_model(scheduler_output,
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] self._prepare_inputs(scheduler_output))
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] self.input_batch.block_table.commit(num_reqs)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]
[2025-05-22T05:33:18Z] Process EngineCore_0:
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-22T05:33:18Z] self.run()
[2025-05-22T05:33:18Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-22T05:33:18Z] self._target(*self._args, **self._kwargs)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 497, in run_engine_core
[2025-05-22T05:33:18Z] raise e
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core
[2025-05-22T05:33:18Z] engine_core.run_busy_loop()
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop
[2025-05-22T05:33:18Z] self._process_engine_step()
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step
[2025-05-22T05:33:18Z] outputs = self.step_fn()
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step
[2025-05-22T05:33:18Z] model_output = self.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 213, in execute_model
[2025-05-22T05:33:18Z] raise err
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model
[2025-05-22T05:33:18Z] return self.model_executor.execute_model(scheduler_output)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
[2025-05-22T05:33:18Z] output = self.collective_rpc("execute_model",
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-22T05:33:18Z] answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
[2025-05-22T05:33:18Z] output = self.model_runner.execute_model(scheduler_output,
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z] return func(*args, **kwargs)
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model
[2025-05-22T05:33:18Z] self._prepare_inputs(scheduler_output))
[2025-05-22T05:33:18Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs
[2025-05-22T05:33:18Z] self.input_batch.block_table.commit(num_reqs)
[2025-05-22T05:33:18Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit
[2025-05-22T05:33:18Z] self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z]
[2025-05-22T05:33:18Z]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Type
Projects
Status
Done