[Bug]: v0.5.5 crash: "AssertionError: expected running sequences"

### Your current environment

Running the standard v0.5.5 docker image from your Dockerhub repo without anything additional added to it.

### 🐛 Describe the bug

When using Llama 3.1 70b AWQ model running on 4 A10G 24Gb GPUs with args:

```
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--enable-prefix-caching
--num-scheduler-steps 8
--dtype half
--max-model-len 32768
```

vLLM crashes and requires a full restart. Error:
```
	
INFO 08-29 19:33:37 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=AssertionError('expected running sequences')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
    async for request_output in results_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 1064, in generate
    async for output in await self.add_request(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 113, in generator
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
    request_outputs = self._process_model_outputs(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
    self.output_processor.process_outputs(seq_group, outputs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
    assert seqs, "expected running sequences"
AssertionError: expected running sequences
```

The issue is random, the same query does NOT reproduce it.

We have upgraded 6 hours ago and since then this happened 3 times.

We now need to downgrade and consider v0.5.5 a buggy release.


### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions