Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2MoE #29377

Merged
merged 46 commits into from
Mar 27, 2024
Merged

Add Qwen2MoE #29377

merged 46 commits into from
Mar 27, 2024

Conversation

bozheng-hit
Copy link
Contributor

@bozheng-hit bozheng-hit commented Feb 29, 2024

Adding Qwen2MoE

This PR adds the support of codes for the coming Qwen2MoE models. For information about Qwen, please visit https://github.com/QwenLM/Qwen. @ArthurZucker

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a few small comments for a small review for now!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall alright, I think one thing we need to be careful about is to test in the terst_modeling_qwen2_moe when there are sparse and non sparse. The loss needs to ignore None router logits!

src/transformers/models/qwen2_moe/modeling_qwen2_moe.py Outdated Show resolved Hide resolved
@JustinLin610
Copy link
Contributor

Hey, I guess our update of the testing solves the issues mentioned. Take another look?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Almost not comment, I'll do another full review tomorrow but it is pretty much mergable ! 🔥

Mostly using the latest api would be nice for cache_positions in the Model, but it's really optional!

README.md Outdated Show resolved Hide resolved
docs/source/en/model_doc/qwen2_moe.md Show resolved Hide resolved
docs/source/en/model_doc/qwen2_moe.md Show resolved Hide resolved
src/transformers/models/qwen2_moe/modeling_qwen2_moe.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2_moe/modeling_qwen2_moe.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2_moe/modeling_qwen2_moe.py Outdated Show resolved Hide resolved
bozheng-hit and others added 2 commits March 21, 2024 23:27
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@bozheng-hit
Copy link
Contributor Author

@ArthurZucker Hi, I have rebased our branch and solved all the conflicts. I think the codes are ready to be merged now.

@ArthurZucker
Copy link
Collaborator

Thanks for your efforts! merging 🥳

@ArthurZucker ArthurZucker merged commit 1c39974 into huggingface:main Mar 27, 2024
21 checks passed
@ydshieh
Copy link
Collaborator

ydshieh commented Mar 27, 2024

Hi @bozheng-hit

Thank you for adding this model 🚀

I see Qwen/Qwen1.5-MoE-A2.7B is used in the tests, but I could not find it on the Hub. See

https://huggingface.co/models?search=Qwen1.5-MoE

Could you check this, open a PR to make some necessary updates, and make sure the integration tests pass by running

RUN_SLOW=1 TF_FORCE_GPU_ALLOW_GROWTH=yes python3 -m pytest -v tests/models/qwen2_moe

Thank you in advance

@lucasjinreal
Copy link

Didn't saw Qwen/Qwen1.5-MoE-A2.7B anywhere, time estimation on new model release?

itazap pushed a commit that referenced this pull request May 14, 2024
* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants