[Bug]: v0.8.5.post1 Eagle3 broken with llama3-70b

### Your current environment

vllm v0.8.5.post1 
NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4 
NVIDIA H100 80GB HBM3


### 🐛 Describe the bug

The vllm 0.8.5.post1 works on meta-llama/Llama-3.1-8B-Instruct with eagle3, but when i change the model to  meta-llama/Llama-3.3-70B-Instruct and send the request, it will be broken, please help to figure out, thanks a lot. 

Start script:
```
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_USE_V1=1
python3 -m vllm.entrypoints.openai.api_server \
        --model meta-llama/Llama-3.3-70B-Instruct \
        --disable-log-requests --port 8080 \
        --served-model-name zoom_llama_3_70b \
        --tensor-parallel-size 4 \
        --device cuda \
        --speculative_config '{"method": "eagle3", "model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B", "num_speculative_tokens": 2}'

```

Error Log:
```
DEBUG 05-21 02:19:18 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
DEBUG 05-21 02:19:28 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(VllmWorker rank=3 pid=1820) DEBUG 05-21 02:19:38 [shm_broadcast.py:430] No available shared memory broadcast block foundin 60 second.
(VllmWorker rank=0 pid=1817) DEBUG 05-21 02:19:38 [shm_broadcast.py:430] No available shared memory broadcast block foundin 60 second.
(VllmWorker rank=1 pid=1818) DEBUG 05-21 02:19:38 [shm_broadcast.py:430] No available shared memory broadcast block foundin 60 second.
(VllmWorker rank=2 pid=1819) DEBUG 05-21 02:19:38 [shm_broadcast.py:430] No available shared memory broadcast block foundin 60 second.
DEBUG 05-21 02:19:38 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO:     127.0.0.1:41210 - "GET /v1/chat/completions HTTP/1.1" 405 Method Not Allowed
WARNING 05-21 02:19:42 [protocol.py:71] The following fields were present in the request but ignored: {'include_special_tokens'}
INFO 05-21 02:19:42 [chat_utils.py:397] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
INFO:     127.0.0.1:41216 - "POST /v1/chat/completions HTTP/1.1" 200 OK
DEBUG 05-21 02:19:42 [core.py:427] EngineCore loop active.
INFO 05-21 02:19:48 [loggers.py:111] Engine 000: Avg prompt throughput: 77.7 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
INFO 05-21 02:19:58 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
DEBUG 05-21 02:20:08 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
DEBUG 05-21 02:20:18 [loggers.py:111] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
ERROR 05-21 02:20:24 [core.py:398] EngineCore encountered a fatal error.
ERROR 05-21 02:20:24 [core.py:398] Traceback (most recent call last):
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 181, in collective_rpc
ERROR 05-21 02:20:24 [core.py:398]     status, result = w.worker_response_mq.dequeue(
ERROR 05-21 02:20:24 [core.py:398]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479, in dequeue
ERROR 05-21 02:20:24 [core.py:398]     with self.acquire_read(timeout, cancel) as buf:
ERROR 05-21 02:20:24 [core.py:398]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/contextlib.py", line 137, in __enter__
ERROR 05-21 02:20:24 [core.py:398]     return next(self.gen)
ERROR 05-21 02:20:24 [core.py:398]            ^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 443, in acquire_read
ERROR 05-21 02:20:24 [core.py:398]     raise TimeoutError
ERROR 05-21 02:20:24 [core.py:398] TimeoutError
ERROR 05-21 02:20:24 [core.py:398] 
ERROR 05-21 02:20:24 [core.py:398] The above exception was the direct cause of the following exception:
ERROR 05-21 02:20:24 [core.py:398] 
ERROR 05-21 02:20:24 [core.py:398] Traceback (most recent call last):
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 389, in run_engine_core
ERROR 05-21 02:20:24 [core.py:398]     engine_core.run_busy_loop()
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 413, in run_busy_loop
ERROR 05-21 02:20:24 [core.py:398]     self._process_engine_step()
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 438, in _process_engine_step
ERROR 05-21 02:20:24 [core.py:398]     outputs = self.step_fn()
ERROR 05-21 02:20:24 [core.py:398]               ^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 203, in step
ERROR 05-21 02:20:24 [core.py:398]     output = self.model_executor.execute_model(scheduler_output)
ERROR 05-21 02:20:24 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 146, in execute_model
ERROR 05-21 02:20:24 [core.py:398]     (output, ) = self.collective_rpc("execute_model",
ERROR 05-21 02:20:24 [core.py:398]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [core.py:398]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 193, in collective_rpc
ERROR 05-21 02:20:24 [core.py:398]     raise TimeoutError(f"RPC call to {method} timed out.") from e
ERROR 05-21 02:20:24 [core.py:398] TimeoutError: RPC call to execute_model timed out.
ERROR 05-21 02:20:24 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 05-21 02:20:24 [async_llm.py:399] Traceback (most recent call last):
ERROR 05-21 02:20:24 [async_llm.py:399]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-21 02:20:24 [async_llm.py:399]     outputs = await engine_core.get_output_async()
ERROR 05-21 02:20:24 [async_llm.py:399]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [async_llm.py:399]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-21 02:20:24 [async_llm.py:399]     raise self._format_exception(outputs) from None
ERROR 05-21 02:20:24 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
ERROR 05-21 02:20:24 [serving_chat.py:885] Error in chat completion stream generator.
ERROR 05-21 02:20:24 [serving_chat.py:885] Traceback (most recent call last):
ERROR 05-21 02:20:24 [serving_chat.py:885]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 487, in chat_completion_stream_generator
ERROR 05-21 02:20:24 [serving_chat.py:885]     async for res in result_generator:
ERROR 05-21 02:20:24 [serving_chat.py:885]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 306, in generate
ERROR 05-21 02:20:24 [serving_chat.py:885]     out = q.get_nowait() or await q.get()
ERROR 05-21 02:20:24 [serving_chat.py:885]                             ^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [serving_chat.py:885]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/output_processor.py", line 51, in get
ERROR 05-21 02:20:24 [serving_chat.py:885]     raise output
ERROR 05-21 02:20:24 [serving_chat.py:885]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-21 02:20:24 [serving_chat.py:885]     outputs = await engine_core.get_output_async()
ERROR 05-21 02:20:24 [serving_chat.py:885]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 02:20:24 [serving_chat.py:885]   File "/home/anaconda3/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-21 02:20:24 [serving_chat.py:885]     raise self._format_exception(outputs) from None
ERROR 05-21 02:20:24 [serving_chat.py:885] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1792]
/home/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/home/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 5 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '


```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: v0.8.5.post1 Eagle3 broken with llama3-70b #18452

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: v0.8.5.post1 Eagle3 broken with llama3-70b #18452

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions