Skip to content

Conversation

@Pasewark
Copy link
Contributor

Right now, the token embedding is done on all tokens in the context during generation and then the last token embedding is used. This change just makes it so during inference the last token is sent to the embedding so that it only has to be lookup up for the last token.

@lucidrains
Copy link
Owner

@Pasewark lgtm, thank you Eric!

@lucidrains lucidrains merged commit e7a867c into lucidrains:main Mar 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants