Skip to content

add shared experts for upcoming Granite 4.0 language models #35894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 14, 2025

Conversation

mayank31398
Copy link
Contributor

This PR adds support for shared experts in GraniteMoE model class for upcoming Granite 4.0 language models.
@ArthurZucker

@mayank31398
Copy link
Contributor Author

@ArthurZucker can you merge this?
all checks have passed

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! As always for us this will be a new model, with modular it should be super easy to add however! https://huggingface.co/docs/transformers/en/modular_transformers 🤗

@shawntan
Copy link
Contributor

shawntan commented Feb 5, 2025

We're adding an additional feature (shared experts) that doesn't break past checkpoints, and is an extension of our own model class. Would every extension entail a new model class?

@ArthurZucker
Copy link
Collaborator

Yes 🤗 I am sorry but that is the way we have been handling every single model so far!

@shawntan
Copy link
Contributor

shawntan commented Feb 7, 2025

Not sure how to get the tests to pass , some are not due to the changes I've made.

shawntan and others added 10 commits February 7, 2025 13:56
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
@mayank31398 mayank31398 marked this pull request as draft February 11, 2025 17:17
Ssukriti and others added 3 commits February 11, 2025 16:10
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
@mayank31398 mayank31398 marked this pull request as ready for review February 13, 2025 00:41
@mayank31398
Copy link
Contributor Author

@ArthurZucker the PR is ready, please review.
The failing tests seem unrelated

@ArthurZucker ArthurZucker self-requested a review February 13, 2025 09:55
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean! Super nice!



class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM):
_tied_weights_keys = ["lm_head.weight"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the mlp is shared, should it appear here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it shouldnt.
shared means its a shared in sense of experts (not across layers)

@mayank31398
Copy link
Contributor Author

Thanks for approving, please merge as soon as possible :)

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
@Ssukriti
Copy link
Contributor

I have updated with the main branch , made corresponding changes and all checks have passed :)

@ArthurZucker ArthurZucker merged commit a570e2b into huggingface:main Feb 14, 2025
23 checks passed
@mayank31398 mayank31398 deleted the shared-experts branch February 14, 2025 15:58
ArthurZucker pushed a commit that referenced this pull request Feb 17, 2025
* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <shawntan@ibm.com>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add documentation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* merge main

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Shawn Tan <shawntan@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Shawn Tan <shawntan@ibm.com>
Co-authored-by: Shawn Tan <shawn@wtf.sg>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Feb 21, 2025
…ace#35894)

* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <shawntan@ibm.com>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add documentation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* merge main

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Shawn Tan <shawntan@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Shawn Tan <shawntan@ibm.com>
Co-authored-by: Shawn Tan <shawn@wtf.sg>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
@EwoutH
Copy link

EwoutH commented May 4, 2025

Is the public release of Granite-4.0-Tiny-Preview in any way relevant to this PR? (like does it warrant any follow-up work, additional validation/testing/CI, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants