-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Bugfix] Fix EAGLE3 broken logits #18909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Thank you for the PR! Could you share more on what is t2d, d2t, and |
loaded_weights = loader.load_weights(model_weights.items()) | ||
|
||
if 'd2t' not in loaded_weights: | ||
if not any('draft_id_to_target_id' in name for name in loaded_weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't:
if 'draft_id_to_target_id' not in loaded_weights:
work? Isn't it just a set
of weight names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in the description, but the idea here is to be a little more robust to future changes. Weight names are prefixed by the module they live on, such that an mlp weight would look something like model.layers.0.mlp.weight
, etc. It happens that draft_id_to_target_id
lives at the top level module (LlamaForCausalLM) similar to the lm_head, but if it were refactored to be a part of LlamaModel
or if it were made into a class (in which case the weight might be draft_id_to_target_id.mapping
or similar), this check would no longer succeed.
I am open to discussion if this seems too generic of a comparison, but I think it is a reasonable way of checking loaded weight names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in the description,
I'm sorry I missed it!
but the idea here is to be a little more robust to future changes. Weight names are prefixed by the module they live on, such that an mlp weight would look something like
model.layers.0.mlp.weight
, etc. It happens thatdraft_id_to_target_id
lives at the top level module (LlamaForCausalLM) similar to the lm_head, but if it were refactored to be a part ofLlamaModel
or if it were made into a class (in which case the weight might bedraft_id_to_target_id.mapping
or similar), this check would no longer succeed.I am open to discussion if this seems too generic of a comparison, but I think it is a reasonable way of checking loaded weight names.
I'm not really experienced enough to comment on how likely such moves are, etc. But I do wonder whether and why this case is special? Wouldn't your rationale apply to all weights, in which case we would have weight loading utility functions that would handle this sort of matching generically
It at least needs a comment - otherwise it becomes something that people are fearful of changing in future because there might be some model out there that relies on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a diff to use the AutoWeightsLoader feature for this. I think this case is somewhat unique as it enables an "optional" weight parameter, which I understand to be a bit unconventional. Let me know if this is satisfactory, or if you want additional clarification.
@ekagra-ranjan These mappings enable a smaller draft_vocab_size. For When implementing EAGLE3, I renamed The recent PR aimed to make this optional, to support future EAGLE3-style models which do not use a smaller vocab size for the draft model. The bug is simply that when disabling the mapping, I was checking if a weight named |
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@benchislett - could you pls share where does this mapping come from and how to generate this? I did a quick find in the EAGLE3 paper and could not find it. |
@ekagra-ranjan it is undocumented in the paper. The inclusion in the huggingface weights and "draft_vocab_size" are indicators of what it does. It is present in the EAGLE repository, which you can find in multiple places by searching for "d2t". Here is a discussion thread on the EAGLE repository where it is explained in more detail: |
if self.draft_id_to_target_id is None: | ||
assert logits.shape[1] == self.config.vocab_size, \ | ||
"Expected logits to have shape " \ | ||
f"(*, {self.config.vocab_size}), but got {logits.shape}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dumb question: Now that we don't set self.draft_id_to_target_id = None
, how can this branch be taken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benchislett ^ Could you please answer this question?
Merging this to fix the bug. |
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: amit <amit.man@gmail.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: amit <amit.man@gmail.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: minpeter <kali2005611@gmail.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
This PR incorrectly checks the loaded_weights and will always set
draft_id_to_target_id
toNone
(sorry!).This PR fixes the bug by checking the updated name instead of 'd2t', checks for inclusion instead of exact-match for a more reliable check in case it gets moved in the future, and adds an assertion to prevent regression.