Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MosaicGPT a HuggingFace PreTrainedModel #243

Merged
merged 103 commits into from
Mar 22, 2023

Conversation

dakinggg
Copy link
Collaborator

@dakinggg dakinggg commented Mar 17, 2023

(you can ignore the long list of commits this PR claims to have, it is just because I started working off of the 0.13.2 merge PR before it was merged to main)

This PR makes MosaicGPT a subclass of transformers.PreTrainedModel, allowing us to call generate and other HF utils. Note: there are still some todos, like supporting kv caching, but I think all of those should be called out with explicit errors in the code, and should be easy to add once this base is in. Note: this PR will wait for the KV caching PR to be merged to main before merging.

Testing todos:

  • make sure generate works properly on a trained model, both with and without padding
    Shows 7b with batched (i.e. padded) and not batched output (i.e. not padded)
['I believe that MosaicML is the best company in the world. Someday, I hope to be a part of it.\nI am a senior at the University of Texas at', '<|endoftext|><|endoftext|>Q: What city is the capital of the United States?\nA: Washington, D.C. The capital of the United States is Washington, D.C.\n', '<|endoftext|><|endoftext|><|endoftext|>Q: What is the capital of the United States?\nA: Washington, D.C. The capital of the United States is Washington, D.C.\n'] 2023-03-20 22:53:23

I believe that MosaicML is the best company in the world. Someday, I hope to be a part of it.
I am a senior at the University of Michigan,

Q: What city is the capital of the United States?
A: Washington, D.C. The capital of the United States is Washington, D.C.

Q: What is the capital of the United States?
A: Washington, D.C. The capital of the United States is Washington, D.C.
  • compare training run before and after, need to test with all three attention implementations
    (base run is from before this PR and uses flash attention)

Screen Shot 2023-03-20 at 4 07 59 PM

  • add tests for save_pretrained and from_pretrained
  • add kv caching support after vitaliy's pr

@dakinggg dakinggg requested a review from dskhudia March 21, 2023 18:30
@dakinggg
Copy link
Collaborator Author

Screen Shot 2023-03-21 at 5 09 45 PM

1bs training again

@vchiley vchiley self-requested a review March 22, 2023 00:36
Copy link
Contributor

@vchiley vchiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chonker PR!

examples/llm/src/models/configuration_mosaic_gpt.py Outdated Show resolved Hide resolved
examples/llm/src/models/configuration_mosaic_gpt.py Outdated Show resolved Hide resolved
examples/llm/src/models/layers/attention.py Show resolved Hide resolved
examples/llm/src/models/mosaic_gpt.py Show resolved Hide resolved
examples/llm/src/models/mosaic_gpt.py Show resolved Hide resolved
examples/llm/src/models/mosaic_gpt.py Outdated Show resolved Hide resolved
examples/llm/src/models/mosaic_gpt.py Show resolved Hide resolved
examples/llm/tests/test_model.py Show resolved Hide resolved
Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
Copy link
Contributor

@vchiley vchiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the updates!

@dakinggg dakinggg merged commit 226e371 into mosaicml:main Mar 22, 2023
@dakinggg dakinggg deleted the mgpt_to_hf branch September 9, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants