Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TransformerTokenizer pad_token_id reset #1068

Merged
merged 1 commit into from
Jul 27, 2024

Conversation

ispobock
Copy link
Contributor

For some LLM models (such as chatglm3-6b), the pad_token_id is defined as 0 and immutable. We only need to reset for undefined cases.

@ispobock
Copy link
Contributor Author

ispobock commented Jul 27, 2024

@shawnz @rlouf Could you help review?

Copy link
Contributor

@lapp0 lapp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find, thanks for fixing!

@rlouf rlouf merged commit bb92745 into dottxt-ai:main Jul 27, 2024
7 checks passed
@rlouf rlouf added the transformers Linked to the `transformers` integration label Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transformers Linked to the `transformers` integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants