Description
Your current environment
Still failing on main as of commit bca55b5
🐛 Describe the bug
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_multiple_sampling_params - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Logs
ERROR 05-20 03:26:38 [dump_input.py:68] Dumping input data
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model
return self.model_executor.execute_model(scheduler_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
output = self.collective_rpc("execute_model",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
output = self.model_runner.execute_model(scheduler_output,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model
self._prepare_inputs(scheduler_output))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs
self.input_batch.block_table.commit(num_reqs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit
block_table.commit(num_reqs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit
self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.12/logging/__init__.py", line 1160, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/logging/__init__.py", line 999, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/formatter.py", line 13, in format
msg = logging.Formatter.format(self, record)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/logging/__init__.py", line 703, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/logging/__init__.py", line 392, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 4484, in __str__
f"compilation_config={self.compilation_config!r}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3868, in __repr__
for k, v in asdict(self).items():
^^^^^^^^^^^^
File "/usr/lib/python3.12/dataclasses.py", line 1329, in asdict
return _asdict_inner(obj, dict_factory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/dataclasses.py", line 1339, in _asdict_inner
f.name: _asdict_inner(getattr(obj, f.name), dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/dataclasses.py", line 1382, in _asdict_inner
return type(obj)((_asdict_inner(k, dict_factory),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/dataclasses.py", line 1383, in <genexpr>
_asdict_inner(v, dict_factory))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/dataclasses.py", line 1386, in _asdict_inner
return copy.deepcopy(obj)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 162, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 259, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 136, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 221, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_tensor.py", line 172, in __deepcopy__
new_storage = self._typed_storage()._deepcopy(memo)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 1134, in _deepcopy
return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 239, in __deepcopy__
new_storage = self.clone()
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 253, in clone
return type(self)(self.nbytes(), device=self.device).copy_(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Call stack:
File "<string>", line 1, in <module>
File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135, in _main
return self._bootstrap(parent_sentinel)
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core
engine_core.run_busy_loop()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop
self._process_engine_step()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step
outputs = self.step_fn()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step
model_output = self.execute_model(scheduler_output)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 206, in execute_model
dump_engine_exception(self.vllm_config, scheduler_output,
File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 62, in dump_engine_exception
_dump_engine_exception(config, scheduler_output, scheduler_stats)
File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 70, in _dump_engine_exception
logger.error(
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
ERROR 05-20 03:26:38 [dump_input.py:78] Dumping scheduler output for model execution:
ERROR 05-20 03:26:38 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=0,prompt_token_ids_len=1,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=[[1]],num_computed_tokens=0,lora_request=None)],scheduled_cached_reqs=[],num_scheduled_tokens={0: 1},total_num_scheduled_tokens=1,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={},num_common_prefix_blocks=[1],finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
ERROR 05-20 03:26:38 [core.py:491] EngineCore encountered a fatal error.
ERROR 05-20 03:26:38 [core.py:491] Traceback (most recent call last):
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core
ERROR 05-20 03:26:38 [core.py:491] engine_core.run_busy_loop()
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop
ERROR 05-20 03:26:38 [core.py:491] self._process_engine_step()
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step
ERROR 05-20 03:26:38 [core.py:491] outputs = self.step_fn()
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step
ERROR 05-20 03:26:38 [core.py:491] model_output = self.execute_model(scheduler_output)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 209, in execute_model
ERROR 05-20 03:26:38 [core.py:491] raise err
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model
ERROR 05-20 03:26:38 [core.py:491] return self.model_executor.execute_model(scheduler_output)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
ERROR 05-20 03:26:38 [core.py:491] output = self.collective_rpc("execute_model",
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-20 03:26:38 [core.py:491] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
ERROR 05-20 03:26:38 [core.py:491] return func(*args, **kwargs)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-20 03:26:38 [core.py:491] return func(*args, **kwargs)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
ERROR 05-20 03:26:38 [core.py:491] output = self.model_runner.execute_model(scheduler_output,
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-20 03:26:38 [core.py:491] return func(*args, **kwargs)
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model
ERROR 05-20 03:26:38 [core.py:491] self._prepare_inputs(scheduler_output))
ERROR 05-20 03:26:38 [core.py:491] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs
ERROR 05-20 03:26:38 [core.py:491] self.input_batch.block_table.commit(num_reqs)
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit
ERROR 05-20 03:26:38 [core.py:491] block_table.commit(num_reqs)
ERROR 05-20 03:26:38 [core.py:491] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit
ERROR 05-20 03:26:38 [core.py:491] self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
ERROR 05-20 03:26:38 [core.py:491] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 05-20 03:26:38 [core.py:491] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 05-20 03:26:38 [core.py:491] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 05-20 03:26:38 [core.py:491] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 05-20 03:26:38 [core.py:491]
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core
engine_core.run_busy_loop()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop
self._process_engine_step()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step
outputs = self.step_fn()
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step
model_output = self.execute_model(scheduler_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 209, in execute_model
raise err
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model
return self.model_executor.execute_model(scheduler_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
output = self.collective_rpc("execute_model",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
output = self.model_runner.execute_model(scheduler_output,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model
self._prepare_inputs(scheduler_output))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs
self.input_batch.block_table.commit(num_reqs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit
block_table.commit(num_reqs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit
self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1]
Adding requests: 0% 0/1 [00:00<?, ?it/s]�[A
Adding requests: 0% 0/1 [00:00<?, ?it/s]
Processed prompts: 0% 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2]
Adding requests: 0% 0/1 [00:00<?, ?it/s]
Adding requests: 0% 0/1 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3]
Adding requests: 0% 0/1 [00:00<?, ?it/s]
Adding requests: 0% 0/1 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens
Adding requests: 0% 0/4 [00:00<?, ?it/s]
Adding requests: 0% 0/4 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_multiple_sampling_params
Adding requests: 0% 0/4 [00:00<?, ?it/s]
Adding requests: 0% 0/4 [00:00<?, ?it/s]
FAILED[rank0]:[W520 03:26:39.408248601 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
=================================== FAILURES ===================================
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] ______
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0]
@pytest.mark.skip_global_cleanup
@pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
prompt_token_ids):
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
> v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
sampling_params=sampling_params)
entrypoints/llm/test_generate.py:56:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:479: in generate
outputs = self._run_engine(use_tqdm=use_tqdm)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1464: in _run_engine
step_outputs = self.llm_engine.step()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:223: in step
outputs = self.engine_core.get_output()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def get_output(self) -> EngineCoreOutputs:
# If an exception arises in process_outputs_socket task,
# it is forwarded to the outputs_queue so we can raise it
# from this (run_output_handler) task to shut down the server.
outputs = self.outputs_queue.get()
if isinstance(outputs, Exception):
> raise self._format_exception(outputs) from None
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:647: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] ______
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 1]
@pytest.mark.skip_global_cleanup
@pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
prompt_token_ids):
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
> v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
sampling_params=sampling_params)
entrypoints/llm/test_generate.py:56:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def ensure_alive(self):
if self.resources.engine_dead:
> raise EngineDeadError()
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] ______
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 2, 1]
@pytest.mark.skip_global_cleanup
@pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
prompt_token_ids):
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
> v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
sampling_params=sampling_params)
entrypoints/llm/test_generate.py:56:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def ensure_alive(self):
if self.resources.engine_dead:
> raise EngineDeadError()
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] ______
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 3, 1, 2]
@pytest.mark.skip_global_cleanup
@pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
prompt_token_ids):
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
> v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
sampling_params=sampling_params)
entrypoints/llm/test_generate.py:56:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def ensure_alive(self):
if self.resources.engine_dead:
> raise EngineDeadError()
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
________________ test_v1_v2_api_consistency_multi_prompt_tokens ________________
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
@pytest.mark.skip_global_cleanup
def test_v1_v2_api_consistency_multi_prompt_tokens(llm: LLM):
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
> v1_output = llm.generate(prompt_token_ids=TOKEN_IDS,
sampling_params=sampling_params)
entrypoints/llm/test_generate.py:69:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def ensure_alive(self):
if self.resources.engine_dead:
> raise EngineDeadError()
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
________________________ test_multiple_sampling_params _________________________
llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
@pytest.mark.skip_global_cleanup
def test_multiple_sampling_params(llm: LLM):
sampling_params = [
SamplingParams(temperature=0.01, top_p=0.95),
SamplingParams(temperature=0.3, top_p=0.95),
SamplingParams(temperature=0.7, top_p=0.95),
SamplingParams(temperature=0.99, top_p=0.95),
]
# Multiple SamplingParams should be matched with each prompt
> outputs = llm.generate(PROMPTS, sampling_params=sampling_params)
entrypoints/llm/test_generate.py:91:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>
def ensure_alive(self):
if self.resources.engine_dead:
> raise EngineDeadError()
E vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
=============================== warnings summary ===============================
../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
/usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
ref_error: type[Exception] = jsonschema.RefResolutionError,
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_multiple_sampling_params - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
======================== 6 failed, 1 warning in 35.48s =========================
^^^ +++
🚨 Error: The command exited with status 1
^^^ +++
user command error: The plugin docker command hook exited with status 1
~~~ Running global pre-exit hook
$ /etc/buildkite-agent/hooks/pre-exit
~~~ Running plugin docker pre-exit hook
$ /var/lib/buildkite-agent/plugins/bk-gpu-1-queue-ci-i-0dcbea0ffe6f32681-1/github-com-buildkite-plugins-docker-buildkite-plugin-v5-2-0/hooks/pre-exit
Metadata
Metadata
Assignees
Type
Projects
Status
Done