-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
Running the standard v0.5.5 docker image from your Dockerhub repo without anything additional added to it.
🐛 Describe the bug
When using Llama 3.1 70b AWQ model running on 4 A10G 24Gb GPUs with args:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--enable-prefix-caching
--num-scheduler-steps 8
--dtype half
--max-model-len 32768
vLLM crashes and requires a full restart. Error:
INFO 08-29 19:33:37 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=AssertionError('expected running sequences')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 1064, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 113, in generator
raise result
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
return_value = task.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
result = task.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
request_outputs = self._process_model_outputs(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
self.output_processor.process_outputs(seq_group, outputs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
assert seqs, "expected running sequences"
AssertionError: expected running sequences
The issue is random, the same query does NOT reproduce it.
We have upgraded 6 hours ago and since then this happened 3 times.
We now need to downgrade and consider v0.5.5 a buggy release.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity