Add OLMoE #32406

Muennighoff · 2024-08-03T18:27:50Z

What does this PR do?

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

The model will be released in ~1 week - can we already review this so that we can merge right upon release?

…ormers into olmoe

…ansformers into olmoe

ArthurZucker

Hey! Feel free to ping me for a review once ready!

Muennighoff · 2024-08-19T15:03:50Z

Hey! Feel free to ping me for a review once ready!

It's ready! :) I will update the README & double-check the slow tests once the model is released if that's fine!

Muennighoff · 2024-08-22T23:33:32Z

@ArthurZucker would be great if we could get it reviewed soon 😇

Muennighoff · 2024-08-24T21:18:01Z

We'll release the model on Tuesday, would be amazing if we could have this approved by then!

ArthurZucker · 2024-08-27T12:57:49Z

Oups sorry reviewing now

ArthurZucker

Sorry for the late review!
Mostly missing copied from and should be good to go otherwise.
Could go the extra mile to have compile compatible verseion of the MOE blcok!

docs/source/en/model_doc/olmoe.md

src/transformers/models/olmoe/configuration_olmoe.py

ArthurZucker · 2024-08-27T12:59:11Z

src/transformers/models/olmoe/convert_olmoe_weights_to_hf.py

@@ -0,0 +1,281 @@
+# Licensed under the Apache License, Version 2.0 (the "License");


date is missing here!

Is that mandatory? Maybe the entire header can just be removed? Seems redundant to have this header for every file when the license is clear from the repo..

😄 yeah it is redundant but the licence AFAIK but it's a nit don't worry

src/transformers/models/olmoe/modeling_olmoe.py

ArthurZucker · 2024-08-27T13:09:55Z

src/transformers/models/olmoe/modeling_olmoe.py

+        return final_hidden_states, router_logits
+
+
+class OlmoeDecoderLayer(nn.Module):


Different from others cuz we have no shared expert

src/transformers/models/olmoe/modeling_olmoe.py

HuggingFaceDocBuilderDev · 2024-08-28T06:58:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-08-29T09:02:48Z

COuld you make sure you rebased and CIs are green? Will review again after that!

ArthurZucker

LGTM let's rebase and fix the CIs 🤗 (there is also a todo in the .md as well)

Muennighoff · 2024-09-03T04:22:34Z

Great, all passing! 🙌

ArthurZucker · 2024-09-03T17:04:18Z

Thanks for your contribution! 🔥

kalomaze · 2024-09-05T10:54:30Z

Is it a problem on my end if I notice that a higher native batch size during training results in higher losses? (With DeepSpeed)

Purple line is higher native bs FFT, orange line is gradient accumulation instead of a higher native batch size.

I'm guessing expert routing parallelism is maybe not properly handled for bs>1?

ArthurZucker · 2024-09-06T11:30:50Z

cc @Muennighoff would have a lot more clues than me 😉

Muennighoff · 2024-09-07T22:43:49Z

Is it a problem on my end if I notice that a higher native batch size during training results in higher losses? (With DeepSpeed)

Purple line is higher native bs FFT, orange line is gradient accumulation instead of a higher native batch size.

I'm guessing expert routing parallelism is maybe not properly handled for bs>1?

Hm not sure about this - are other models the same for you in both scenarios? How about other MoEs?

* Add OLMoE * Add OLMoE * Updates * Make norm optional; add keys * Add output * Add * Fix dtype * Fix eos config * Update * Add OLMoE * Fix OLMoE path * Format * Format * Rmv copy statement * Rmv copy statement * Format * Add copies * Cp rotary * Fix aming * Fix naming * Update RoPE integration; num_logits_to_keep; Add copy statements * Add eps to config * Format * Add aux loss * Adapt router_aux_loss_coef * Update md * Adapt * adapt tests

Muennighoff added 23 commits July 20, 2024 19:00

Add OLMoE

082973c

Add OLMoE

8588576

Updates

f1c569e

Make norm optional; add keys

f6ea7c5

Add output

4d56722

Add

8b176d9

Fix dtype

452da8d

Fix eos config

140bafb

Update

91f95fd

Add OLMoE

6c20b73

git pushMerge branch 'olmoe' of https://github.com/Muennighoff/transf…

171602e

…ormers into olmoe

Fix OLMoE path

30a4feb

Merge branch 'huggingface:main' into olmoe

698f156

Format

474f8e8

git stah popMerge branch 'olmoe' of https://github.com/Muennighoff/tr…

e7e2ce3

…ansformers into olmoe

Format

d3eeef0

Rmv copy statement

28cdfd8

Rmv copy statement

58aed4a

Format

f9fbd12

Add copies

16ed9e1

Cp rotary

b9a045a

Fix aming

4c598be

Fix naming

50507ea

ArthurZucker reviewed Aug 19, 2024

View reviewed changes

ArthurZucker reviewed Aug 27, 2024

View reviewed changes

Merge branch 'huggingface:main' into olmoe

1d9b006

Muennighoff added 3 commits August 27, 2024 12:31

Update RoPE integration; num_logits_to_keep; Add copy statements

b9948cc

Add eps to config

e97ae0e

Format

fd0baf5

Muennighoff requested a review from ArthurZucker August 27, 2024 21:50

Add aux loss

79e0ecc

Adapt router_aux_loss_coef

758a808

ArthurZucker approved these changes Sep 2, 2024

View reviewed changes

Muennighoff added 4 commits September 2, 2024 20:34

Update md

efdcda6

Merge branch 'huggingface:main' into olmoe

42145af

Adapt

34ef8f5

adapt tests

30aace4

ArthurZucker merged commit ecd61c6 into huggingface:main Sep 3, 2024
23 checks passed

yonigozlan mentioned this pull request Sep 3, 2024

Add validate images and text inputs order util for processors and test_processing_utils #33285

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OLMoE #32406

Add OLMoE #32406

Muennighoff commented Aug 3, 2024 •

edited

Loading

ArthurZucker left a comment

Muennighoff commented Aug 19, 2024

Muennighoff commented Aug 22, 2024

Muennighoff commented Aug 24, 2024

ArthurZucker commented Aug 27, 2024

ArthurZucker left a comment

ArthurZucker Aug 27, 2024

Muennighoff Aug 27, 2024

ArthurZucker Aug 28, 2024

ArthurZucker Aug 27, 2024

Muennighoff Aug 27, 2024

HuggingFaceDocBuilderDev commented Aug 28, 2024

ArthurZucker commented Aug 29, 2024

ArthurZucker left a comment

Muennighoff commented Sep 3, 2024

ArthurZucker commented Sep 3, 2024

kalomaze commented Sep 5, 2024 •

edited

Loading

ArthurZucker commented Sep 6, 2024

Muennighoff commented Sep 7, 2024

		@@ -0,0 +1,281 @@
		# Licensed under the Apache License, Version 2.0 (the "License");

		return final_hidden_states, router_logits


		class OlmoeDecoderLayer(nn.Module):

Add OLMoE #32406

Add OLMoE #32406

Conversation

Muennighoff commented Aug 3, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

Muennighoff commented Aug 19, 2024

Muennighoff commented Aug 22, 2024

Muennighoff commented Aug 24, 2024

ArthurZucker commented Aug 27, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 27, 2024

Choose a reason for hiding this comment

Muennighoff Aug 27, 2024

Choose a reason for hiding this comment

ArthurZucker Aug 28, 2024

Choose a reason for hiding this comment

ArthurZucker Aug 27, 2024

Choose a reason for hiding this comment

Muennighoff Aug 27, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 28, 2024

ArthurZucker commented Aug 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Muennighoff commented Sep 3, 2024

ArthurZucker commented Sep 3, 2024

kalomaze commented Sep 5, 2024 • edited Loading

ArthurZucker commented Sep 6, 2024

Muennighoff commented Sep 7, 2024

Muennighoff commented Aug 3, 2024 •

edited

Loading

kalomaze commented Sep 5, 2024 •

edited

Loading