Skip to content

[Bug][Failing Test] entrypoints-test - test_v1_v2_api_consistency_single_prompt_tokens #18418

Closed
@markmc

Description

@markmc

Your current environment

Still failing on main as of commit bca55b5

🐛 Describe the bug

Failing tests: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main&period=2days&query=test_v1_v2_api_consistency_single_prompt_tokens&commit=Search

FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_multiple_sampling_params - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Logs
ERROR 05-20 03:26:38 [dump_input.py:68] Dumping input data
--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model
    return self.model_executor.execute_model(scheduler_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
    output = self.model_runner.execute_model(scheduler_output,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model
    self._prepare_inputs(scheduler_output))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs
    self.input_batch.block_table.commit(num_reqs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit
    block_table.commit(num_reqs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit
    self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/logging/__init__.py", line 1160, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/logging/__init__.py", line 999, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/formatter.py", line 13, in format
    msg = logging.Formatter.format(self, record)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/logging/__init__.py", line 703, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/logging/__init__.py", line 392, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 4484, in __str__
    f"compilation_config={self.compilation_config!r}")
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3868, in __repr__
    for k, v in asdict(self).items():
                ^^^^^^^^^^^^
  File "/usr/lib/python3.12/dataclasses.py", line 1329, in asdict
    return _asdict_inner(obj, dict_factory)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/dataclasses.py", line 1339, in _asdict_inner
    f.name: _asdict_inner(getattr(obj, f.name), dict)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/dataclasses.py", line 1382, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/dataclasses.py", line 1383, in <genexpr>
    _asdict_inner(v, dict_factory))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/dataclasses.py", line 1386, in _asdict_inner
    return copy.deepcopy(obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_tensor.py", line 172, in __deepcopy__
    new_storage = self._typed_storage()._deepcopy(memo)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 1134, in _deepcopy
    return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 239, in __deepcopy__
    new_storage = self.clone()
                  ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/storage.py", line 253, in clone
    return type(self)(self.nbytes(), device=self.device).copy_(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Call stack:
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135, in _main
    return self._bootstrap(parent_sentinel)
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core
    engine_core.run_busy_loop()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop
    self._process_engine_step()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step
    outputs = self.step_fn()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step
    model_output = self.execute_model(scheduler_output)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 206, in execute_model
    dump_engine_exception(self.vllm_config, scheduler_output,
  File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 62, in dump_engine_exception
    _dump_engine_exception(config, scheduler_output, scheduler_stats)
  File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/dump_input.py", line 70, in _dump_engine_exception
    logger.error(
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
ERROR 05-20 03:26:38 [dump_input.py:78] Dumping scheduler output for model execution:
ERROR 05-20 03:26:38 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=0,prompt_token_ids_len=1,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=[[1]],num_computed_tokens=0,lora_request=None)],scheduled_cached_reqs=[],num_scheduled_tokens={0: 1},total_num_scheduled_tokens=1,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={},num_common_prefix_blocks=[1],finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
ERROR 05-20 03:26:38 [core.py:491] EngineCore encountered a fatal error.

ERROR 05-20 03:26:38 [core.py:491] Traceback (most recent call last):

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core

ERROR 05-20 03:26:38 [core.py:491]     engine_core.run_busy_loop()

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop

ERROR 05-20 03:26:38 [core.py:491]     self._process_engine_step()

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step

ERROR 05-20 03:26:38 [core.py:491]     outputs = self.step_fn()

ERROR 05-20 03:26:38 [core.py:491]               ^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step

ERROR 05-20 03:26:38 [core.py:491]     model_output = self.execute_model(scheduler_output)

ERROR 05-20 03:26:38 [core.py:491]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 209, in execute_model

ERROR 05-20 03:26:38 [core.py:491]     raise err

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model

ERROR 05-20 03:26:38 [core.py:491]     return self.model_executor.execute_model(scheduler_output)

ERROR 05-20 03:26:38 [core.py:491]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model

ERROR 05-20 03:26:38 [core.py:491]     output = self.collective_rpc("execute_model",

ERROR 05-20 03:26:38 [core.py:491]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc

ERROR 05-20 03:26:38 [core.py:491]     answer = run_method(self.driver_worker, method, args, kwargs)

ERROR 05-20 03:26:38 [core.py:491]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method

ERROR 05-20 03:26:38 [core.py:491]     return func(*args, **kwargs)

ERROR 05-20 03:26:38 [core.py:491]            ^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

ERROR 05-20 03:26:38 [core.py:491]     return func(*args, **kwargs)

ERROR 05-20 03:26:38 [core.py:491]            ^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model

ERROR 05-20 03:26:38 [core.py:491]     output = self.model_runner.execute_model(scheduler_output,

ERROR 05-20 03:26:38 [core.py:491]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

ERROR 05-20 03:26:38 [core.py:491]     return func(*args, **kwargs)

ERROR 05-20 03:26:38 [core.py:491]            ^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model

ERROR 05-20 03:26:38 [core.py:491]     self._prepare_inputs(scheduler_output))

ERROR 05-20 03:26:38 [core.py:491]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs

ERROR 05-20 03:26:38 [core.py:491]     self.input_batch.block_table.commit(num_reqs)

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit

ERROR 05-20 03:26:38 [core.py:491]     block_table.commit(num_reqs)

ERROR 05-20 03:26:38 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit

ERROR 05-20 03:26:38 [core.py:491]     self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],

ERROR 05-20 03:26:38 [core.py:491] RuntimeError: CUDA error: an illegal memory access was encountered

ERROR 05-20 03:26:38 [core.py:491] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

ERROR 05-20 03:26:38 [core.py:491] For debugging consider passing CUDA_LAUNCH_BLOCKING=1

ERROR 05-20 03:26:38 [core.py:491] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ERROR 05-20 03:26:38 [core.py:491] 
Process EngineCore_0:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core
    engine_core.run_busy_loop()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 509, in run_busy_loop
    self._process_engine_step()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step
    outputs = self.step_fn()
              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step
    model_output = self.execute_model(scheduler_output)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 209, in execute_model
    raise err
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model
    return self.model_executor.execute_model(scheduler_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 276, in execute_model
    output = self.model_runner.execute_model(scheduler_output,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1110, in execute_model
    self._prepare_inputs(scheduler_output))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 503, in _prepare_inputs
    self.input_batch.block_table.commit(num_reqs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 137, in commit
    block_table.commit(num_reqs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/block_table.py", line 83, in commit
    self.block_table[:num_reqs].copy_(self.block_table_cpu[:num_reqs],
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] 

Adding requests:   0% 0/1 [00:00<?, ?it/s]�[A
Adding requests:   0% 0/1 [00:00<?, ?it/s]

Processed prompts:   0% 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] 
Adding requests:   0% 0/1 [00:00<?, ?it/s]
Adding requests:   0% 0/1 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] 
Adding requests:   0% 0/1 [00:00<?, ?it/s]
Adding requests:   0% 0/1 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens 
Adding requests:   0% 0/4 [00:00<?, ?it/s]
Adding requests:   0% 0/4 [00:00<?, ?it/s]
FAILED
entrypoints/llm/test_generate.py::test_multiple_sampling_params 
Adding requests:   0% 0/4 [00:00<?, ?it/s]
Adding requests:   0% 0/4 [00:00<?, ?it/s]
FAILED[rank0]:[W520 03:26:39.408248601 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())


=================================== FAILURES ===================================
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] ______

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0]

    @pytest.mark.skip_global_cleanup
    @pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
    def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
                                                        prompt_token_ids):
        sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
    
        with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
>           v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
                                     sampling_params=sampling_params)

entrypoints/llm/test_generate.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:479: in generate
    outputs = self._run_engine(use_tqdm=use_tqdm)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1464: in _run_engine
    step_outputs = self.llm_engine.step()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:223: in step
    outputs = self.engine_core.get_output()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def get_output(self) -> EngineCoreOutputs:
        # If an exception arises in process_outputs_socket task,
        # it is forwarded to the outputs_queue so we can raise it
        # from this (run_output_handler) task to shut down the server.
        outputs = self.outputs_queue.get()
        if isinstance(outputs, Exception):
>           raise self._format_exception(outputs) from None
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:647: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] ______

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 1]

    @pytest.mark.skip_global_cleanup
    @pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
    def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
                                                        prompt_token_ids):
        sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
    
        with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
>           v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
                                     sampling_params=sampling_params)

entrypoints/llm/test_generate.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
    self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
    self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
    self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
    self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
    self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
    self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def ensure_alive(self):
        if self.resources.engine_dead:
>           raise EngineDeadError()
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] ______

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 2, 1]

    @pytest.mark.skip_global_cleanup
    @pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
    def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
                                                        prompt_token_ids):
        sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
    
        with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
>           v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
                                     sampling_params=sampling_params)

entrypoints/llm/test_generate.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
    self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
    self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
    self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
    self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
    self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
    self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def ensure_alive(self):
        if self.resources.engine_dead:
>           raise EngineDeadError()
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
______ test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] ______

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>
prompt_token_ids = [0, 3, 1, 2]

    @pytest.mark.skip_global_cleanup
    @pytest.mark.parametrize('prompt_token_ids', TOKEN_IDS)
    def test_v1_v2_api_consistency_single_prompt_tokens(llm: LLM,
                                                        prompt_token_ids):
        sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
    
        with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
>           v1_output = llm.generate(prompt_token_ids=prompt_token_ids,
                                     sampling_params=sampling_params)

entrypoints/llm/test_generate.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
    self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
    self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
    self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
    self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
    self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
    self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def ensure_alive(self):
        if self.resources.engine_dead:
>           raise EngineDeadError()
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
________________ test_v1_v2_api_consistency_multi_prompt_tokens ________________

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>

    @pytest.mark.skip_global_cleanup
    def test_v1_v2_api_consistency_multi_prompt_tokens(llm: LLM):
        sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
    
        with pytest.warns(DeprecationWarning, match="'prompt_token_ids'"):
>           v1_output = llm.generate(prompt_token_ids=TOKEN_IDS,
                                     sampling_params=sampling_params)

entrypoints/llm/test_generate.py:69: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
    self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
    self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
    self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
    self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
    self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
    self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def ensure_alive(self):
        if self.resources.engine_dead:
>           raise EngineDeadError()
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
________________________ test_multiple_sampling_params _________________________

llm = <weakproxy at 0x7f13f18ea7f0 to LLM at 0x7f13f4043980>

    @pytest.mark.skip_global_cleanup
    def test_multiple_sampling_params(llm: LLM):
        sampling_params = [
            SamplingParams(temperature=0.01, top_p=0.95),
            SamplingParams(temperature=0.3, top_p=0.95),
            SamplingParams(temperature=0.7, top_p=0.95),
            SamplingParams(temperature=0.99, top_p=0.95),
        ]
    
        # Multiple SamplingParams should be matched with each prompt
>       outputs = llm.generate(PROMPTS, sampling_params=sampling_params)

entrypoints/llm/test_generate.py:91: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/vllm/utils.py:1212: in inner
    return fn(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:469: in generate
    self._validate_and_add_requests(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1392: in _validate_and_add_requests
    self._add_request(
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:1412: in _add_request
    self.llm_engine.add_request(
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:198: in add_request
    self.engine_core.add_request(request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:675: in add_request
    self._send_input(EngineCoreRequestType.ADD, request)
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:651: in _send_input
    self.ensure_alive()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f13f40599a0>

    def ensure_alive(self):
        if self.resources.engine_dead:
>           raise EngineDeadError()
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:554: EngineDeadError
=============================== warnings summary ===============================
../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids0] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids1] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids2] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_single_prompt_tokens[prompt_token_ids3] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_v1_v2_api_consistency_multi_prompt_tokens - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED entrypoints/llm/test_generate.py::test_multiple_sampling_params - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
======================== 6 failed, 1 warning in 35.48s =========================
^^^ +++
🚨 Error: The command exited with status 1
^^^ +++
user command error: The plugin docker command hook exited with status 1
~~~ Running global pre-exit hook
$ /etc/buildkite-agent/hooks/pre-exit
~~~ Running plugin docker pre-exit hook
$ /var/lib/buildkite-agent/plugins/bk-gpu-1-queue-ci-i-0dcbea0ffe6f32681-1/github-com-buildkite-plugins-docker-buildkite-plugin-v5-2-0/hooks/pre-exit

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci-failureIssue about an unexpected test failure in CI

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions