Skip to content

Conversation

@KyleMylonakisProtopia
Copy link
Contributor

What does this PR do?

GPT OSS Models have a conversion script which allows transforming from the distributed quantized weights to a Hugging Face model in BFloat16. This also converts aspects of the tokenizer.

As a side note, it also fixes a minor bug in the TikToken tokenizer which caused a hard crash when iterating over a NoneType object when converting the tokenizer.

Fixes # (issue)
This resolves an open issue in the Hugging Face Accelerate repository by making the conversion script functional: huggingface/accelerate#3882 (comment)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests? No

Who can review?

@ArthurZucker @Cyrilvallez

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversion script updates look good! I'm a little wary about the change in the core convert_slow_tokenizer.py, though, so I might need a second opinion from an expert there

with open(chat_template_path, "w") as f:
json.dump({"chat_template": chat_template}, f, indent=2)
print("Saving chat template...")
chat_template_path = os.path.join(save_dir, "chat_template.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer raw chat_template.jinja in modern conversions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to change the filename to chat_template.jinja?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the extension name as requested.

Comment on lines -1900 to +1903
tokenizer.add_special_tokens(
[AddedToken(token, normalized=False, special=True) for token in self.extra_special_tokens]
)
if self.extra_special_tokens is not None:
tokenizer.add_special_tokens(
[AddedToken(token, normalized=False, special=True) for token in self.extra_special_tokens]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need changes in the core code! cc @itazap @ArthurZucker before I can approve this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hard-crashes otherwise because self.extra_special_tokens can be None, and is in the conversion script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping for @itazap and @ArthurZucker feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! How can we get this over the line? Would love to see this change in Transformers 5.0.0 release.

@Cyrilvallez
Copy link
Member

cc @ArthurZucker

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42901&sha=e48a61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants