Skip to content

[Bug][Failing Test]: Multi-Modal Models 3 - models/multimodal/generation/test_common.py #18528

@DarkLight1337

Description

@DarkLight1337

Your current environment

N/A

🐛 Describe the bug

models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case91] is failing on main. It is another illegal memory access error.

https://buildkite.com/vllm/ci/builds/20503/steps?jid=0196f626-d4d6-4af6-b10f-da8c3145ddfc

Stack:

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:68] Dumping input data
--- Logging error ---
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model
[2025-05-22T05:33:18Z]     return self.model_executor.execute_model(scheduler_output)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
[2025-05-22T05:33:18Z]     output = self.collective_rpc("execute_model",
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-22T05:33:18Z]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
[2025-05-22T05:33:18Z]     output = self.model_runner.execute_model(scheduler_output,
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model
[2025-05-22T05:33:18Z]     self._prepare_inputs(scheduler_output))
[2025-05-22T05:33:18Z]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs
[2025-05-22T05:33:18Z]     self.input_batch.block_table.commit(num_reqs)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit
[2025-05-22T05:33:18Z]     self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z] 
[2025-05-22T05:33:18Z] 
[2025-05-22T05:33:18Z] During handling of the above exception, another exception occurred:
[2025-05-22T05:33:18Z] 
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/logging/__init__.py", line 1160, in emit
[2025-05-22T05:33:18Z]     msg = self.format(record)
[2025-05-22T05:33:18Z]           ^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/logging/__init__.py", line 999, in format
[2025-05-22T05:33:18Z]     return fmt.format(record)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/formatter.py", line 13, in format
[2025-05-22T05:33:18Z]     msg = logging.Formatter.format(self, record)
[2025-05-22T05:33:18Z]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/logging/__init__.py", line 703, in format
[2025-05-22T05:33:18Z]     record.message = record.getMessage()
[2025-05-22T05:33:18Z]                      ^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/logging/__init__.py", line 392, in getMessage
[2025-05-22T05:33:18Z]     msg = msg % self.args
[2025-05-22T05:33:18Z]           ~~~~^~~~~~~~~~~
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 4488, in __str__
[2025-05-22T05:33:18Z]     f"compilation_config={self.compilation_config!r}")
[2025-05-22T05:33:18Z]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3872, in __repr__
[2025-05-22T05:33:18Z]     for k, v in asdict(self).items():
[2025-05-22T05:33:18Z]                 ^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/dataclasses.py", line 1329, in asdict
[2025-05-22T05:33:18Z]     return _asdict_inner(obj, dict_factory)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/dataclasses.py", line 1339, in _asdict_inner
[2025-05-22T05:33:18Z]     f.name: _asdict_inner(getattr(obj, f.name), dict)
[2025-05-22T05:33:18Z]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/dataclasses.py", line 1382, in _asdict_inner
[2025-05-22T05:33:18Z]     return type(obj)((_asdict_inner(k, dict_factory),
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/dataclasses.py", line 1383, in <genexpr>
[2025-05-22T05:33:18Z]     _asdict_inner(v, dict_factory))
[2025-05-22T05:33:18Z]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/dataclasses.py", line 1386, in _asdict_inner
[2025-05-22T05:33:18Z]     return copy.deepcopy(obj)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 162, in deepcopy
[2025-05-22T05:33:18Z]     y = _reconstruct(x, memo, *rv)
[2025-05-22T05:33:18Z]         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 259, in _reconstruct
[2025-05-22T05:33:18Z]     state = deepcopy(state, memo)
[2025-05-22T05:33:18Z]             ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 136, in deepcopy
[2025-05-22T05:33:18Z]     y = copier(x, memo)
[2025-05-22T05:33:18Z]         ^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 221, in _deepcopy_dict
[2025-05-22T05:33:18Z]     y[deepcopy(key, memo)] = deepcopy(value, memo)
[2025-05-22T05:33:18Z]                              ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
[2025-05-22T05:33:18Z]     y = copier(memo)
[2025-05-22T05:33:18Z]         ^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/_tensor.py", line 172, in __deepcopy__
[2025-05-22T05:33:18Z]     new_storage = self._typed_storage()._deepcopy(memo)
[2025-05-22T05:33:18Z]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 1134, in _deepcopy
[2025-05-22T05:33:18Z]     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
[2025-05-22T05:33:18Z]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
[2025-05-22T05:33:18Z]     y = copier(memo)
[2025-05-22T05:33:18Z]         ^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 239, in __deepcopy__
[2025-05-22T05:33:18Z]     new_storage = self.clone()
[2025-05-22T05:33:18Z]                   ^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 253, in clone
[2025-05-22T05:33:18Z]     return type(self)(self.nbytes(), device=self.device).copy_(self)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z] 
[2025-05-22T05:33:18Z] Call stack:
[2025-05-22T05:33:18Z]   File "<string>", line 1, in <module>
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
[2025-05-22T05:33:18Z]     exitcode = _main(fd, parent_sentinel)
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135, in _main
[2025-05-22T05:33:18Z]     return self._bootstrap(parent_sentinel)
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-22T05:33:18Z]     self.run()
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-22T05:33:18Z]     self._target(*self._args, **self._kwargs)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core
[2025-05-22T05:33:18Z]     engine_core.run_busy_loop()
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop
[2025-05-22T05:33:18Z]     self._process_engine_step()
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step
[2025-05-22T05:33:18Z]     outputs = self.step_fn()
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step
[2025-05-22T05:33:18Z]     model_output = self.execute_model(scheduler_output)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 210, in execute_model
[2025-05-22T05:33:18Z]     dump_engine_exception(self.vllm_config, scheduler_output,
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 62, in dump_engine_exception
[2025-05-22T05:33:18Z]     _dump_engine_exception(config, scheduler_output, scheduler_stats)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 70, in _dump_engine_exception
[2025-05-22T05:33:18Z]     logger.error(
[2025-05-22T05:33:18Z] Unable to print the message and arguments - possible formatting error.
[2025-05-22T05:33:18Z] Use the traceback above to help find the error.
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:78] Dumping scheduler output for model execution:
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=0,prompt_token_ids_len=281,mm_inputs=[{'pixel_values': tensor([[[[-0.6314, -0.6314, -0.6314,  ...,  0.5922,  0.5451,  0.5373],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.6314, -0.6314, -0.6314,  ...,  0.5922,  0.5451,  0.5373],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.6314, -0.6314, -0.6314,  ...,  0.5529,  0.5059,  0.4980],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           ...,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3176,  0.3176,  0.3020,  ...,  0.5294,  0.5373,  0.5373],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3176,  0.3176,  0.3020,  ...,  0.5294,  0.5373,  0.5373],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3176,  0.3176,  0.3020,  ...,  0.5294,  0.5373,  0.5373]],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] 

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          [[-0.8980, -0.8980, -0.8980,  ...,  0.5216,  0.4431,  0.4353],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.8980, -0.8980, -0.8980,  ...,  0.5216,  0.4431,  0.4353],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.8980, -0.8980, -0.8980,  ...,  0.4588,  0.3882,  0.3804],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           ...,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3647,  0.3647,  0.3490,  ...,  0.5451,  0.5529,  0.5529],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3647,  0.3647,  0.3490,  ...,  0.5451,  0.5529,  0.5529],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.3647,  0.3647,  0.3490,  ...,  0.5451,  0.5529,  0.5529]],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79] 

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          [[-0.9686, -0.9686, -0.9686,  ...,  0.4510,  0.3490,  0.3333],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.9686, -0.9686, -0.9686,  ...,  0.4510,  0.3490,  0.3333],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [-0.9686, -0.9686, -0.9686,  ...,  0.3725,  0.2784,  0.2627],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           ...,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.2863,  0.2863,  0.2706,  ...,  0.4431,  0.4510,  0.4510],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.2863,  0.2863,  0.2706,  ...,  0.4431,  0.4510,  0.4510],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]           [ 0.2863,  0.2863,  0.2706,  ...,  0.4431,  0.4510,  0.4510]]]]), 'num_crops': tensor([0])}],mm_hashes=['f60a83610bcc902af2e0be4780926de06a310afae0d11f9d2feee331134ff15a'],mm_positions=[PlaceholderRange(offset=4, length=260, is_embed=tensor([False, False,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [dump_input.py:79]          True,  True,  True,  True,  True,  True,  True,  True, False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=5, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18],num_computed_tokens=0,lora_request=None)],scheduled_cached_reqs=[],num_scheduled_tokens={0: 281},total_num_scheduled_tokens=281,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={0: [0]},num_common_prefix_blocks=18,finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] EngineCore encountered a fatal error.

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] Traceback (most recent call last):

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     engine_core.run_busy_loop()

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     self._process_engine_step()

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     outputs = self.step_fn()

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]               ^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     model_output = self.execute_model(scheduler_output)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 213, in execute_model

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     raise err

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     return self.model_executor.execute_model(scheduler_output)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     output = self.collective_rpc("execute_model",

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     answer = run_method(self.driver_worker, method, args, kwargs)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     return func(*args, **kwargs)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]            ^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     return func(*args, **kwargs)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]            ^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     output = self.model_runner.execute_model(scheduler_output,

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     return func(*args, **kwargs)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]            ^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     self._prepare_inputs(scheduler_output))

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     self.input_batch.block_table.commit(num_reqs)

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495]     self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] RuntimeError: CUDA error: an illegal memory access was encountered

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] For debugging consider passing CUDA_LAUNCH_BLOCKING=1

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[2025-05-22T05:33:18Z] ERROR 05-21 22:33:18 [core.py:495] 
[2025-05-22T05:33:18Z] Process EngineCore_0:
[2025-05-22T05:33:18Z] Traceback (most recent call last):
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-22T05:33:18Z]     self.run()
[2025-05-22T05:33:18Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-22T05:33:18Z]     self._target(*self._args, **self._kwargs)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 497, in run_engine_core
[2025-05-22T05:33:18Z]     raise e
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 486, in run_engine_core
[2025-05-22T05:33:18Z]     engine_core.run_busy_loop()
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 513, in run_busy_loop
[2025-05-22T05:33:18Z]     self._process_engine_step()
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 538, in _process_engine_step
[2025-05-22T05:33:18Z]     outputs = self.step_fn()
[2025-05-22T05:33:18Z]               ^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 226, in step
[2025-05-22T05:33:18Z]     model_output = self.execute_model(scheduler_output)
[2025-05-22T05:33:18Z]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 213, in execute_model
[2025-05-22T05:33:18Z]     raise err
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in execute_model
[2025-05-22T05:33:18Z]     return self.model_executor.execute_model(scheduler_output)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
[2025-05-22T05:33:18Z]     output = self.collective_rpc("execute_model",
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-22T05:33:18Z]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
[2025-05-22T05:33:18Z]     output = self.model_runner.execute_model(scheduler_output,
[2025-05-22T05:33:18Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-22T05:33:18Z]     return func(*args, **kwargs)
[2025-05-22T05:33:18Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1121, in execute_model
[2025-05-22T05:33:18Z]     self._prepare_inputs(scheduler_output))
[2025-05-22T05:33:18Z]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 528, in _prepare_inputs
[2025-05-22T05:33:18Z]     self.input_batch.block_table.commit(num_reqs)
[2025-05-22T05:33:18Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 81, in commit
[2025-05-22T05:33:18Z]     self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
[2025-05-22T05:33:18Z] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-22T05:33:18Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-22T05:33:18Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-22T05:33:18Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-22T05:33:18Z] 
[2025-05-22T05:33:18Z] 

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions