Skip to content

Conversation

benchislett
Copy link
Collaborator

@benchislett benchislett commented May 29, 2025

This PR incorrectly checks the loaded_weights and will always set draft_id_to_target_id to None (sorry!).

This PR fixes the bug by checking the updated name instead of 'd2t', checks for inclusion instead of exact-match for a more reliable check in case it gets moved in the future, and adds an assertion to prevent regression.

  • Confirmed that the previous code hits the assertion and fails:
ERROR 05-29 15:17:48 [core.py:502]     assert logits.shape[1] == self.config.vocab_size, \
ERROR 05-29 15:17:48 [core.py:502] AssertionError: Expected logits to have shape (*, 128256), but got torch.Size([1, 32000])
  • Previous acceptance on sample data (4 requests from MT_Bench):
--------------------------------------------------
mean acceptance length: 1.12
--------------------------------------------------
acceptance at token 0:0.11
acceptance at token 1:0.01
acceptance at token 2:0.00
acceptance at token 3:0.00
  • New acceptance on the same data:
--------------------------------------------------
mean acceptance length: 2.99
--------------------------------------------------
acceptance at token 0:0.77
acceptance at token 1:0.55
acceptance at token 2:0.38
acceptance at token 3:0.29
  • To reproduce:
VLLM_USE_V1=1 python3 examples/offline_inference/eagle.py --method eagle3 --max_num_seqs 1 --num_spec_tokens 4 --dataset mt_bench.jsonl --num_prompts 4 --enforce_eager

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@ekagra-ranjan
Copy link
Contributor

Thank you for the PR!

Could you share more on what is t2d, d2t, and draft_id_to_target_id ? If they are present in weight name yuhuili/EAGLE3-LLaMA3.1-Instruct-8B then could you share some examples of the parameter names?

loaded_weights = loader.load_weights(model_weights.items())

if 'd2t' not in loaded_weights:
if not any('draft_id_to_target_id' in name for name in loaded_weights):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't:

if 'draft_id_to_target_id' not in loaded_weights:

work? Isn't it just a set of weight names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in the description, but the idea here is to be a little more robust to future changes. Weight names are prefixed by the module they live on, such that an mlp weight would look something like model.layers.0.mlp.weight, etc. It happens that draft_id_to_target_id lives at the top level module (LlamaForCausalLM) similar to the lm_head, but if it were refactored to be a part of LlamaModel or if it were made into a class (in which case the weight might be draft_id_to_target_id.mapping or similar), this check would no longer succeed.

I am open to discussion if this seems too generic of a comparison, but I think it is a reasonable way of checking loaded weight names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in the description,

I'm sorry I missed it!

but the idea here is to be a little more robust to future changes. Weight names are prefixed by the module they live on, such that an mlp weight would look something like model.layers.0.mlp.weight, etc. It happens that draft_id_to_target_id lives at the top level module (LlamaForCausalLM) similar to the lm_head, but if it were refactored to be a part of LlamaModel or if it were made into a class (in which case the weight might be draft_id_to_target_id.mapping or similar), this check would no longer succeed.

I am open to discussion if this seems too generic of a comparison, but I think it is a reasonable way of checking loaded weight names.

I'm not really experienced enough to comment on how likely such moves are, etc. But I do wonder whether and why this case is special? Wouldn't your rationale apply to all weights, in which case we would have weight loading utility functions that would handle this sort of matching generically

It at least needs a comment - otherwise it becomes something that people are fearful of changing in future because there might be some model out there that relies on it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a diff to use the AutoWeightsLoader feature for this. I think this case is somewhat unique as it enables an "optional" weight parameter, which I understand to be a bit unconventional. Let me know if this is satisfactory, or if you want additional clarification.

@benchislett
Copy link
Collaborator Author

@ekagra-ranjan These mappings enable a smaller draft_vocab_size. For yuhuili/EAGLE3-LLaMA3.1-Instruct-8B, the draft_vocab_size is 32k. In this case, it is a subset of the total target vocab (of size ~128k). Since the logits are dense, their indices need to correspond to the sampled token ids. The d2t map is an index mapping which correlates the index of a logit in the smaller vocab (of size 32k) with the corresponding index in the target vocab.
I believe t2d is the inverse mapping, and is needed for inference.

When implementing EAGLE3, I renamed d2t to draft_id_to_target_id for clarity in our implementation. When loading weights, the d2t weight is loaded into the member for draft_id_to_target_id.

The recent PR aimed to make this optional, to support future EAGLE3-style models which do not use a smaller vocab size for the draft model. The bug is simply that when disabling the mapping, I was checking if a weight named d2t was loaded, instead of the re-named draft_id_to_target_id.

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@ekagra-ranjan
Copy link
Contributor

@benchislett - could you pls share where does this mapping come from and how to generate this? I did a quick find in the EAGLE3 paper and could not find it.

@benchislett
Copy link
Collaborator Author

@ekagra-ranjan it is undocumented in the paper. The inclusion in the huggingface weights and "draft_vocab_size" are indicators of what it does.

It is present in the EAGLE repository, which you can find in multiple places by searching for "d2t". Here is a discussion thread on the EAGLE repository where it is explained in more detail:

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label May 30, 2025
Comment on lines 217 to +220
if self.draft_id_to_target_id is None:
assert logits.shape[1] == self.config.vocab_size, \
"Expected logits to have shape " \
f"(*, {self.config.vocab_size}), but got {logits.shape}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dumb question: Now that we don't set self.draft_id_to_target_id = None, how can this branch be taken?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benchislett ^ Could you please answer this question?

@WoosukKwon
Copy link
Collaborator

Merging this to fix the bug.
@benchislett Could you please add a test about this? We do have a basic correctness test for eagle (test_eagle_correctness), but it seems the test is too loose to capture this bug.

@WoosukKwon WoosukKwon merged commit 1bc86a3 into vllm-project:main Jun 1, 2025
74 checks passed
amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: amit <amit.man@gmail.com>
amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: amit <amit.man@gmail.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: minpeter <kali2005611@gmail.com>
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants