-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B #19033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
WoosukKwon
merged 8 commits into
vllm-project:main
from
CentML:eagle-fix-vocab-embedding-init
Jun 6, 2025
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
ecee9fe
fix eagle logits bug
benchislett ef6fae6
refactor to use autoweightsloader
benchislett c0083f0
load eagle vocab embedding more carefully
benchislett 758be48
Merge remote-tracking branch 'upstream/main' into eagle-fix-vocab-emb…
benchislett 44a6871
remove broken allclose check
benchislett 6a10c52
fix EAGLE1 loading also
benchislett f365cd1
Merge remote-tracking branch 'upstream/main' into eagle-fix-vocab-emb…
benchislett b197594
update tests
benchislett File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon can share if
gc.collect()
andtorch.cuda.empty_cache()
are fine here. Maybe there is some reason why they were not already added before.I believe this was added because we delete some torch tensor after allocation. Just in case for some reason we think its better to avoid these new gc commands, an alternative approach to avoid it would be to first load draft model weights from checkpoint and determine if the draft vocab is needed and then pass this info to draft model object instantiation which can skip allocating draft vocab and achieve the same objective.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current approach makes sense, as enforcing GC and clearing the torch cache seem like natural choices to improve the accuracy of the memory profiler.
If we foresee any issues with calling GC/cleanup in this way, then I'm on board for doing it the other way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed with @ekagra-ranjan, though I didn’t see a clear problem. Let’s keep this in mind and revisit if any issue arises.