Add Doge model #35891

LoserCheems · 2025-01-25T18:50:46Z

What does this PR do?

Fixes #35889
Support the Doge-SLM family of small language models.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

to: @ArthurZucker

This reverts commit 229cdca.

Rocketknight1 · 2025-01-27T14:51:41Z

Hi @LoserCheems, and thanks for the PR! The model looks cool and I like the paper too, but we're trying to add new models using modular in future. You can see a guide here, and an example modular PR here.

If you write modular_doge.py with inheritance then the configuration_ and modeling_ files will be auto-generated. This makes the PR much shorter and easier to review.

LoserCheems · 2025-01-27T17:48:23Z

Thank you @Rocketknight1 , I've written modular_doge.py, but I'm sorry I don't quite understand modular and may make smoe mistakes...

Rocketknight1 · 2025-01-27T18:53:29Z

Hi @LoserCheems yes, don't worry, it's a new feature so everyone is a bit confused about it! 😅

Your modular_doge.py file looks good! The next step is to find code that's copied from other models in transformers, and replace that with inheritance. This will make modular_doge.py much smaller, but the full modeling_doge.py will still be generated without inheritance. You can see some examples in the Qwen2.5 PR:

Classes like DogeMLP and DogeForSequenceClassification look like they use code from other library classes like Llama, so you could just inherit those instead in the modular file. You can run make fix-copies to regenerate modeling_doge.py and confirm that it still works.

LoserCheems · 2025-01-28T03:52:08Z

Thank you @Rocketknight1. In fact, because the weight name or config name is different, can directly inherited class is not much, a total of RMSNorm, RotaryEmbedding , and DogeForSequenceClassification inherited from Llama.

Rocketknight1 · 2025-01-28T15:51:16Z

Hi @LoserCheems, the last code quality error is caused by an unprotected import torch. These need to be guarded by if is_torch_available because some people have JAX-only or TF-only systems, and unguarded imports can make it impossible for them to use the library!

There are unrelated failing tests under tests_torch - you can ignore them for now, but once you can get code quality green then let me know and I'll review the PR.

…_utils

LoserCheems · 2025-01-28T16:25:32Z

Sorry @Rocketknight1, I mistakenly imported PretrainedConfig from modeling_utils, which is now fixed.

LoserCheems · 2025-01-28T16:53:07Z

em🤓, There seems to be something wrong with RotaryEmbedding inherited from Llama.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…formers into add-doge-model

ArthurZucker · 2025-06-25T09:49:13Z

Before mergin can you fix the test?
You can inherit the test like it is done in Gemma3 test modeling

LoserCheems · 2025-06-25T17:51:50Z

Hi @ArthurZucker, all the tests have been passed!

Cyrilvallez · 2025-07-08T09:18:15Z

run-slow: doge

github-actions · 2025-07-08T09:19:42Z

This comment contains run-slow, running the specified jobs:

models: ['models/doge']
quantizations: [] ...

github-actions · 2025-07-08T09:36:56Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, doge

Cyrilvallez

All right! I fixed myself what was missing based on our most recent refactors! This is now good to go, merging!
Thanks again for your great work, and for bearing with us! 🤗🚀

* Add Doge Model * Fix code quality * Rollback an error commit * Fix config for open-source weights * Revert "Fix config for open-source weights" This reverts commit 229cdca. * Add modular_doge * Update Doge inherits from Llama * Fix import bug * [docs] Add usage of doge model * Fix Doge import pretrainedconfig from modeling_utils to configuration_utils * [docs] remove trust remote code from doge * Fix dynamo bug in doge model * Update docstrings * Import apply_rotary_pos_emb and repeat_kv from Llama * Fix all nits * Fix code quality * Fix some bugs * Fix code quality * Remove inherited `_update_causal_mask` from Llama This leads to incorrect weight initialization. * Fix the wrong tensor orderings in DogeCDMoE * Fix attention mask bug We have to provide attention_mask for dynamic mask computation * Modify most implementations to inherit from Llama But there are two problems: 1. `flex_attention_forward` is not updated properly 2. `Example` error in the forward method of DogeForCausalLM * Modify CDMoE for batch efficient implementation * Uniform MoE configuration names, just like QwenMoE * Fix code quality * Fix code quality * Fix code quality * Add tp plan of CDMoE Module * Hybird DMA with sliding window * Update valid tokens greater than window size * Fix code quality * Add `convert_doge_weights_to_hf` * Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py * Fix nits in modular_doge * Fix code quality * Fix all nits * Fix all nits * Make sure the attention function is updated inside the class * Fix code quality issues in the Doge model and add a test for it * Fix `test_generate` * Fix code quality * Fix nits fllowing suggestions * Fix code quality * Fix code quality issues * Fix nits * Fix code quality nits * Fix the missing parameters in the configuration. * Fix the missing parameters in the configuration. * Fix nits * Add initialization of attention * Fix last nits * Simplify dynamic mask generation logic * Rename router_logits to gate_logits for matching latest changes of MixtralModel * Rename typings for matching latest changes of MixtralModel * Fixes typo in comment * Update src/transformers/models/doge/modular_doge.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix code quality issues to match other modular * Fix code quality issues to match other modular * Fix the static compilation errors * Update model weights link * Fix code quality issues to match other modular * reapply modular and support for new outputs * style * simplify a lot * fix import location * reapply modular * fix * fix integration test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

LoserCheems and others added 7 commits January 26, 2025 01:31

Add Doge Model

2a38123

Fix code quality

5c96118

Rollback an error commit

0f689d6

Fix config for open-source weights

229cdca

Revert "Fix config for open-source weights"

749dbcd

This reverts commit 229cdca.

Merge branch 'huggingface:main' into add-doge-model

e2f5c36

Merge branch 'main' into add-doge-model

e66977e

LoserCheems added 2 commits January 28, 2025 01:42

Add modular_doge

ca7630a

Merge branch 'main' into add-doge-model

17388cf

LoserCheems added 2 commits January 28, 2025 11:47

Update Doge inherits from Llama

79c0659

Merge branch 'main' into add-doge-model

f4d895c

LoserCheems added 3 commits January 28, 2025 12:07

Fix import bug

941d6b5

Merge branch 'main' into add-doge-model

4958ff1

[docs] Add usage of doge model

1466142

LoserCheems added 2 commits January 29, 2025 00:13

Fix Doge import pretrainedconfig from modeling_utils to configuration…

aa4fcfd

…_utils

Merge branch 'main' into add-doge-model

c346728

eustlb mentioned this pull request Jan 28, 2025

[Whisper] Fix num_return_sequences behavior #35939

Draft

LoserCheems added 2 commits January 29, 2025 00:30

[docs] remove trust remote code from doge

7cbea89

Merge branch 'main' into add-doge-model

cdcbd34

LoserCheems added 4 commits January 29, 2025 09:05

Fix dynamo bug in doge model

c935266

Merge branch 'main' into add-doge-model

3ab3187

Merge branch 'main' into add-doge-model

2c7e1c8

Update docstrings

9612ddb

ArthurZucker and others added 6 commits June 25, 2025 09:59

Merge branch 'main' into add-doge-model

a9107c7

Update src/transformers/models/doge/modular_doge.py

e2d0adf

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Merge branch 'main' into add-doge-model

032b1a8

Fix code quality issues to match other modular

0f0864f

Merge branch 'add-doge-model' of https://github.com/LoserCheems/trans…

0e5ebc8

…formers into add-doge-model

Fix code quality issues to match other modular

0c8a86f

LoserCheems added 4 commits June 26, 2025 01:28

Fix the static compilation errors

eef65d9

Update model weights link

20b9dbf

Merge branch 'main' into add-doge-model

d998bb0

Fix code quality issues to match other modular

dab0350

LoserCheems and others added 2 commits July 1, 2025 22:53

Merge branch 'main' into add-doge-model

fa749f6

Merge branch 'main' into add-doge-model

f3708e0

Cyrilvallez mentioned this pull request Jul 2, 2025

[modular] Follow global indexing and attribute setting, and their dependencies #39180

Merged

Cyrilvallez and others added 8 commits July 7, 2025 15:00

Merge branch 'main' into add-doge-model

8901983

reapply modular and support for new outputs

cc6f4a3

style

a332085

simplify a lot

ba7d77b

fix import location

ee4cedd

Merge branch 'main' into add-doge-model

c92b344

reapply modular

e4bf793

fix

5c2061b

fix integration test

ac48fcf

Cyrilvallez approved these changes Jul 8, 2025

View reviewed changes

Cyrilvallez merged commit d8590b4 into huggingface:main Jul 8, 2025
20 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Doge model #35891

Add Doge model #35891

Uh oh!

LoserCheems commented Jan 25, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

LoserCheems commented Jan 27, 2025

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

Rocketknight1 commented Jan 28, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

ArthurZucker commented Jun 25, 2025 •

edited

Loading

Uh oh!

LoserCheems commented Jun 25, 2025

Uh oh!

Cyrilvallez commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Uh oh!

Add Doge model #35891

Add Doge model #35891

Uh oh!

Conversation

LoserCheems commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

LoserCheems commented Jan 27, 2025

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

Rocketknight1 commented Jan 28, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

LoserCheems commented Jan 28, 2025

Uh oh!

ArthurZucker commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LoserCheems commented Jun 25, 2025

Uh oh!

Cyrilvallez commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LoserCheems commented Jan 25, 2025 •

edited

Loading

ArthurZucker commented Jun 25, 2025 •

edited

Loading