Skip to content

[Bug]: EAGLE-3 loading error for Llama 3.3 70b #19174

Open
@Neo9061

Description

@Neo9061

Your current environment

I am pulling latest vllm v0.9.0.1 and installing the latest fixed PR for EAGLE-3.

Instance is 8*H100.

🐛 Describe the bug

Using serving command. The EAGLE-3 head for llama 3.3 70B is from its official release

python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.3-70B-Instruct  --seed 42 --tensor-parallel-size 8 --max-model-len 131072 --max-num-batched-tokens 131072 --max-num-seqs 100 --max_seq_len_to_capture 131072 --gpu_memory_utilization 0.9 --no-enable-prefix-caching --speculative_config '{
    "model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B",
    "draft_tensor_parallel_size": 8,
    "num_speculative_tokens":5,
    "method": "eagle3"
    }'

Then I hit this error.

(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/test_code/open_vllm/vllm/vllm/model_executor/models/llama_eagle3.py", line 205, in forward
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self.model(input_ids, positions, hidden_states)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/test_code/open_vllm/vllm/vllm/compilation/decorators.py", line 238, in __call__
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     output = self.compiled_callable(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/test_code/open_vllm/vllm/vllm/model_executor/models/llama_eagle3.py", line 121, in forward
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     def forward(
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/fx/graph_module.py", line 830, in call_wrapped
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/fx/graph_module.py", line 406, in __call__
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     raise e
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/fx/graph_module.py", line 393, in __call__
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "<eval_with_key>.167", line 18, in forward
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_hidden_states_, l_self_modules_layers_modules_0_modules_hidden_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_hidden_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_positions_ = l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_ = None
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/test_code/open_vllm/vllm/vllm/compilation/cuda_piecewise_backend.py", line 110, in __call__
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self.compiled_graph_for_general_shape(*args)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/test_code/open_vllm/vllm/vllm/compilation/compiler_interface.py", line 489, in compiled_graph
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     graph_output = inductor_compiled_graph(list_args)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/opt/pytorch/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 460, in __call__
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     return self.current_callable(inputs)
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]   File "/home/ubuntu/.cache/vllm/torch_compile_cache/3825025aed-1/rank_2_0/inductor_cache/ct/ccttgk63mie75qcg7t4swtaeuygoparq7phni4yhir5chq2tpbsb.py", line 365, in call
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522]     assert_size_stride(arg2_1, (16032, 8192), (8192, 1))
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522] AssertionError: expected size 16032==16032, stride 6144==8192 at dim=0; expected size 6144==8192, stride 1==1 at dim=1
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522] This error most often comes from a incorrect fake (aka meta) kernel for a custom op.
(VllmWorker rank=2 pid=1204085) ERROR 06-05 02:08:42 [multiproc_executor.py:522] Use torch.library.opcheck to test your custom op.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions