Add Qwen2MoE #29377

bozheng-hit · 2024-02-29T19:15:26Z

Adding Qwen2MoE

This PR adds the support of codes for the coming Qwen2MoE models. For information about Qwen, please visit https://github.com/QwenLM/Qwen. @ArthurZucker

… into qwen2_moe

ArthurZucker

Thanks a few small comments for a small review for now!

src/transformers/models/qwen2_moe/modeling_qwen2_moe.py

HuggingFaceDocBuilderDev · 2024-03-02T04:41:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks! Overall alright, I think one thing we need to be careful about is to test in the terst_modeling_qwen2_moe when there are sparse and non sparse. The loss needs to ignore None router logits!

src/transformers/models/qwen2_moe/modeling_qwen2_moe.py

JustinLin610 · 2024-03-20T08:54:44Z

Hey, I guess our update of the testing solves the issues mentioned. Take another look?

ArthurZucker

LGTM Almost not comment, I'll do another full review tomorrow but it is pretty much mergable ! 🔥

Mostly using the latest api would be nice for cache_positions in the Model, but it's really optional!

README.md

docs/source/en/model_doc/qwen2_moe.md

src/transformers/models/qwen2_moe/modeling_qwen2_moe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

bozheng-hit · 2024-03-25T06:24:11Z

@ArthurZucker Hi, I have rebased our branch and solved all the conflicts. I think the codes are ready to be merged now.

ArthurZucker · 2024-03-27T01:11:46Z

Thanks for your efforts! merging 🥳

ydshieh · 2024-03-27T15:46:59Z

Hi @bozheng-hit

Thank you for adding this model 🚀

I see Qwen/Qwen1.5-MoE-A2.7B is used in the tests, but I could not find it on the Hub. See

https://huggingface.co/models?search=Qwen1.5-MoE

Could you check this, open a PR to make some necessary updates, and make sure the integration tests pass by running

RUN_SLOW=1 TF_FORCE_GPU_ALLOW_GROWTH=yes python3 -m pytest -v tests/models/qwen2_moe

Thank you in advance

lucasjinreal · 2024-03-28T02:37:21Z

Didn't saw Qwen/Qwen1.5-MoE-A2.7B anywhere, time estimation on new model release?

* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

bozheng-hit added 9 commits February 28, 2024 14:01

add support for qwen2 MoE models

4f933bb

update docs

8ad6c9e

add support for qwen2 MoE models

fbce3b9

update docs

c32b998

Merge branch 'qwen2_moe' of https://github.com/bozheng-hit/transformers…

8274f89

… into qwen2_moe

update model name & test

e44f700

update readme

b09e2ed

update class names & readme & model_doc of Qwen2MoE.

d5e99a6

update architecture name

1625b1f

ArthurZucker reviewed Mar 1, 2024

View reviewed changes

bozheng-hit requested a review from ArthurZucker March 2, 2024 03:24

ArthurZucker reviewed Mar 6, 2024

View reviewed changes

src/transformers/models/qwen2_moe/modeling_qwen2_moe.py Outdated Show resolved Hide resolved

bozheng-hit requested a review from ArthurZucker March 9, 2024 20:39

bozheng-hit added 9 commits March 11, 2024 02:15

fix qwen2_moe tests

051e19d

use Qwen2Tokenizer instead of Qwen2MoeTokenizer

307d9de

update modeling_qwen2_moe.py

4d80bf8

fix model architecture

8b6d57b

fix qwen2_moe tests

b9c2803

use Qwen2Tokenizer instead of Qwen2MoeTokenizer

f8e1819

update modeling_qwen2_moe.py

e4b8445

fix model architecture

8d74bb0

fix style

a50a208

bozheng-hit force-pushed the qwen2_moe branch from 1ba520b to a50a208 Compare March 10, 2024 18:21

fix test when there are sparse and non sparse layers

a04c698

simonJJJ mentioned this pull request Mar 15, 2024

Add qwen2moe ggerganov/llama.cpp#6074

Merged

ArthurZucker reviewed Mar 21, 2024

View reviewed changes

bozheng-hit and others added 2 commits March 21, 2024 23:27

fixup

dc53a8d

Update README.md

8f55aa5

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

bozheng-hit added 17 commits March 25, 2024 13:11

update docs

1c973fb

update model name & test

0841722

update readme

4c0b2b1

update class names & readme & model_doc of Qwen2MoE.

8958743

update architecture name

1e099c5

fix qwen2_moe tests

4906cdf

use Qwen2Tokenizer instead of Qwen2MoeTokenizer

82729ec

update modeling_qwen2_moe.py

a3aa52d

fix model architecture

0686cc6

fixup

c074021

fix qwen2_moe tests

2484604

use Qwen2Tokenizer instead of Qwen2MoeTokenizer

5d1ed37

fix style

27afcd5

fix test when there are sparse and non sparse layers

0d155e9

fixup

46b0918

add archive back

45219a1

fixup

cf61e7e

bozheng-hit force-pushed the qwen2_moe branch from 47296db to 5c627d3 Compare March 25, 2024 10:09

fix integration test

3b9f3a8

bozheng-hit force-pushed the qwen2_moe branch from 2adb93c to 3b9f3a8 Compare March 26, 2024 10:26

Bo Zheng and others added 2 commits March 26, 2024 18:48

fixup

4077877

Merge branch 'main' into qwen2_moe

4d931f0

ArthurZucker merged commit 1c39974 into huggingface:main Mar 27, 2024
21 checks passed

RunningLeon mentioned this pull request Mar 29, 2024

Support qwen2 moe for pytorch engine InternLM/lmdeploy#1372

Merged

l3utterfly mentioned this pull request Apr 1, 2024

Are Qwen1.5 MOE models supported? ggerganov/llama.cpp#6415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2MoE #29377

Add Qwen2MoE #29377

bozheng-hit commented Feb 29, 2024 •

edited

Loading

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Mar 2, 2024

ArthurZucker left a comment

JustinLin610 commented Mar 20, 2024

ArthurZucker left a comment

bozheng-hit commented Mar 25, 2024

ArthurZucker commented Mar 27, 2024

ydshieh commented Mar 27, 2024 •

edited

Loading

lucasjinreal commented Mar 28, 2024

Add Qwen2MoE #29377

Add Qwen2MoE #29377

Conversation

bozheng-hit commented Feb 29, 2024 • edited Loading

Adding Qwen2MoE

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 2, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

JustinLin610 commented Mar 20, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

bozheng-hit commented Mar 25, 2024

ArthurZucker commented Mar 27, 2024

ydshieh commented Mar 27, 2024 • edited Loading

lucasjinreal commented Mar 28, 2024

bozheng-hit commented Feb 29, 2024 •

edited

Loading

ydshieh commented Mar 27, 2024 •

edited

Loading