-
Notifications
You must be signed in to change notification settings - Fork 29.9k
Add Doge model #35891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Doge model #35891
Conversation
This reverts commit 229cdca.
Hi @LoserCheems, and thanks for the PR! The model looks cool and I like the paper too, but we're trying to add new models using modular in future. You can see a guide here, and an example modular PR here. If you write |
Thank you @Rocketknight1 , I've written |
Hi @LoserCheems yes, don't worry, it's a new feature so everyone is a bit confused about it! 😅 Your Classes like |
Thank you @Rocketknight1. In fact, because the weight name or config name is different, can directly inherited class is not much, a total of |
Hi @LoserCheems, the last code quality error is caused by an unprotected There are unrelated failing tests under |
Sorry @Rocketknight1, I mistakenly imported |
em🤓, There seems to be something wrong with |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
…formers into add-doge-model
Before mergin can you fix the test? |
Hi @ArthurZucker, all the tests have been passed! |
run-slow: doge |
This comment contains run-slow, running the specified jobs: models: ['models/doge'] |
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, doge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right! I fixed myself what was missing based on our most recent refactors! This is now good to go, merging!
Thanks again for your great work, and for bearing with us! 🤗🚀
* Add Doge Model * Fix code quality * Rollback an error commit * Fix config for open-source weights * Revert "Fix config for open-source weights" This reverts commit 229cdca. * Add modular_doge * Update Doge inherits from Llama * Fix import bug * [docs] Add usage of doge model * Fix Doge import pretrainedconfig from modeling_utils to configuration_utils * [docs] remove trust remote code from doge * Fix dynamo bug in doge model * Update docstrings * Import apply_rotary_pos_emb and repeat_kv from Llama * Fix all nits * Fix code quality * Fix some bugs * Fix code quality * Remove inherited `_update_causal_mask` from Llama This leads to incorrect weight initialization. * Fix the wrong tensor orderings in DogeCDMoE * Fix attention mask bug We have to provide attention_mask for dynamic mask computation * Modify most implementations to inherit from Llama But there are two problems: 1. `flex_attention_forward` is not updated properly 2. `Example` error in the forward method of DogeForCausalLM * Modify CDMoE for batch efficient implementation * Uniform MoE configuration names, just like QwenMoE * Fix code quality * Fix code quality * Fix code quality * Add tp plan of CDMoE Module * Hybird DMA with sliding window * Update valid tokens greater than window size * Fix code quality * Add `convert_doge_weights_to_hf` * Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py * Fix nits in modular_doge * Fix code quality * Fix all nits * Fix all nits * Make sure the attention function is updated inside the class * Fix code quality issues in the Doge model and add a test for it * Fix `test_generate` * Fix code quality * Fix nits fllowing suggestions * Fix code quality * Fix code quality issues * Fix nits * Fix code quality nits * Fix the missing parameters in the configuration. * Fix the missing parameters in the configuration. * Fix nits * Add initialization of attention * Fix last nits * Simplify dynamic mask generation logic * Rename router_logits to gate_logits for matching latest changes of MixtralModel * Rename typings for matching latest changes of MixtralModel * Fixes typo in comment * Update src/transformers/models/doge/modular_doge.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix code quality issues to match other modular * Fix code quality issues to match other modular * Fix the static compilation errors * Update model weights link * Fix code quality issues to match other modular * reapply modular and support for new outputs * style * simplify a lot * fix import location * reapply modular * fix * fix integration test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
What does this PR do?
Fixes #35889
Support the Doge-SLM family of small language models.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
to: @ArthurZucker