Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StableLM #28810

Merged
merged 11 commits into from
Feb 14, 2024
Merged

Add StableLM #28810

merged 11 commits into from
Feb 14, 2024

Conversation

jon-tow
Copy link
Contributor

@jon-tow jon-tow commented Feb 1, 2024

What does this PR do?

This PR adds modeling support for StableLM 3B 4E1T (as well as StableLM 2 1.6B) based models.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

Notes

TODO: The current online implementation uses an early naming scheme for the model_type

  "model_type": "stablelm_epoch",

I've temporarily created a development model repository https://huggingface.co/jon-tow/stablelm-3b-4e1t-dev for unit testing and config archive mapping which need to be updated before any merging.
Is there a better way to handle this? I've noticed a similar issue in this Phi model PR discussion.

@ArthurZucker
Copy link
Collaborator

Hey! Thanks for contributing!
We usually recommend to start by adding the model on the hub, to allow for a quick distribution!
See the tutorial here

@jon-tow
Copy link
Contributor Author

jon-tow commented Feb 1, 2024

Hello!

We usually recommend to start by adding the model on the hub, to allow for a quick distribution!

The model is already on the hub here but uses custom modeling code. Is your suggestion to simply rename the model_type in the config.json and remove the custom implementation? Sorry if I'm misinterpreting this!

@ArthurZucker
Copy link
Collaborator

No what I mean is I think it's fine to keep it on the hub! 🤗
We usually go for an integration if this is really asked by the community ( lots of activity on the repo / lots of activity on the issue for adding it here etc!)
Thought it's really great that you want to contribute!
If you still want to add it, I would recommend you to make it as close as possible to other modelling files like Llama or Persimmon, and otherwise good that you created a repo for dev 👍🏻

@jon-tow jon-tow marked this pull request as ready for review February 6, 2024 23:05
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already mergeable! Good work there 😉

README.md Outdated Show resolved Hide resolved
src/transformers/models/stablelm/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/stablelm/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/stablelm/modeling_stablelm.py Outdated Show resolved Hide resolved
@jon-tow
Copy link
Contributor Author

jon-tow commented Feb 8, 2024

Hi, @ArthurZucker; thanks for the quick review! I'd like to point out that the recent commit 097272f removes a copied-from comment from StableLmModel because PersimmonModel does not yet support flash-attn and the added _attn_implementation field breaks the make repo-consistency check. Let me know if you suggest a workaround 🙏

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work there !
Looks good to me, would you just mind me merging #27931 before?

This would mean you might have to use copied from mistral instead of Llama. Otherwise I'll merge this one and rebase on my side!

src/transformers/models/auto/configuration_auto.py Outdated Show resolved Hide resolved
logger = logging.get_logger(__name__)

STABLELM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"jon-tow/stablelm-3b-4e1t-dev": "https://huggingface.co/jon-tow/stablelm-3b-4e1t-dev/resolve/main/config.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not forget to use original repo here! (opening a PR to the repo to upload the new config etc)

Copy link
Contributor Author

@jon-tow jon-tow Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! I've opened draft PRs for the base models:

  1. https://huggingface.co/stabilityai/stablelm-3b-4e1t/discussions/10
  2. https://huggingface.co/stabilityai/stablelm-2-1_6b/discussions/6

At what point should these be merged? I assume after the next release of transformers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

src/transformers/models/stablelm/configuration_stablelm.py Outdated Show resolved Hide resolved
src/transformers/models/stablelm/configuration_stablelm.py Outdated Show resolved Hide resolved
src/transformers/models/stablelm/modeling_stablelm.py Outdated Show resolved Hide resolved
@ArthurZucker
Copy link
Collaborator

Could you rebase on main and make sure CIs are all green? 🤗 I can help if you can't finish all of them

@jon-tow
Copy link
Contributor Author

jon-tow commented Feb 12, 2024

Can you please help with the test_hubs workflow? It errors with FAILED tests/trainer/test_trainer.py::TrainerIntegrationWithHubTester::test_push_to_hub_with_saves_each_epoch - AssertionError: 'Training in progress, epoch 1' not found in ['Training in progress, epoch 3', 'Training in progress, epoch 2', 'initial commit'] (here). Not sure how to fix this one up 😅 Thanks!

@jon-tow jon-tow changed the title [WIP] Add StableLM Add StableLM Feb 13, 2024
@jon-tow jon-tow force-pushed the add-stablelm branch 2 times, most recently from 9986ad1 to 6a6a0ca Compare February 13, 2024 04:34
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked, LGTM!

@ArthurZucker ArthurZucker merged commit de6029a into huggingface:main Feb 14, 2024
21 of 22 checks passed
sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 14, 2024
* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`
itazap pushed a commit that referenced this pull request May 14, 2024
* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants