Skip to content

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

@zoltan-fedor

Description

@zoltan-fedor

Your current environment

Running the standard v0.5.5 docker image from your Dockerhub repo without anything additional added to it.

🐛 Describe the bug

When using Llama 3.1 70b AWQ model running on 4 A10G 24Gb GPUs with args:

--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--enable-prefix-caching
--num-scheduler-steps 8
--dtype half
--max-model-len 32768

vLLM crashes and requires a full restart. Error:

	
INFO 08-29 19:33:37 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=AssertionError('expected running sequences')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
    async for request_output in results_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 1064, in generate
    async for output in await self.add_request(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 113, in generator
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
    request_outputs = self._process_model_outputs(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
    self.output_processor.process_outputs(seq_group, outputs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
    assert seqs, "expected running sequences"
AssertionError: expected running sequences

The issue is random, the same query does NOT reproduce it.

We have upgraded 6 hours ago and since then this happened 3 times.

We now need to downgrade and consider v0.5.5 a buggy release.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions