Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2MoE #29377

Merged
merged 46 commits into from
Mar 27, 2024
Merged
Changes from 8 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
4f933bb
add support for qwen2 MoE models
Feb 28, 2024
8ad6c9e
update docs
Feb 28, 2024
fbce3b9
add support for qwen2 MoE models
Feb 28, 2024
c32b998
update docs
Feb 28, 2024
8274f89
Merge branch 'qwen2_moe' of https://github.com/bozheng-hit/transforme…
Feb 28, 2024
e44f700
update model name & test
Feb 29, 2024
b09e2ed
update readme
Feb 29, 2024
d5e99a6
update class names & readme & model_doc of Qwen2MoE.
Feb 29, 2024
1625b1f
update architecture name
Feb 29, 2024
051e19d
fix qwen2_moe tests
Feb 29, 2024
307d9de
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
4d80bf8
update modeling_qwen2_moe.py
Mar 1, 2024
8b6d57b
fix model architecture
Mar 9, 2024
b9c2803
fix qwen2_moe tests
Feb 29, 2024
f8e1819
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
e4b8445
update modeling_qwen2_moe.py
Mar 1, 2024
8d74bb0
fix model architecture
Mar 9, 2024
a50a208
fix style
Mar 10, 2024
a04c698
fix test when there are sparse and non sparse layers
Mar 10, 2024
dc53a8d
fixup
Mar 21, 2024
8f55aa5
Update README.md
bozheng-hit Mar 21, 2024
6a06f8e
fix up
Mar 21, 2024
bf11227
fixup
Mar 22, 2024
e3038db
fixup
Mar 23, 2024
5c627d3
add archive back
Mar 23, 2024
765ebf5
add support for qwen2 MoE models
Feb 28, 2024
1c973fb
update docs
Feb 28, 2024
0841722
update model name & test
Feb 29, 2024
4c0b2b1
update readme
Feb 29, 2024
8958743
update class names & readme & model_doc of Qwen2MoE.
Feb 29, 2024
1e099c5
update architecture name
Feb 29, 2024
4906cdf
fix qwen2_moe tests
Feb 29, 2024
82729ec
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
a3aa52d
update modeling_qwen2_moe.py
Mar 1, 2024
0686cc6
fix model architecture
Mar 9, 2024
c074021
fixup
Mar 21, 2024
2484604
fix qwen2_moe tests
Feb 29, 2024
5d1ed37
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
27afcd5
fix style
Mar 10, 2024
0d155e9
fix test when there are sparse and non sparse layers
Mar 10, 2024
46b0918
fixup
Mar 23, 2024
45219a1
add archive back
Mar 23, 2024
cf61e7e
fixup
Mar 25, 2024
3b9f3a8
fix integration test
Mar 26, 2024
4077877
fixup
Mar 26, 2024
4d931f0
Merge branch 'main' into qwen2_moe
bozheng-hit Mar 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tests/models/qwen2_moe/test_modeling_qwen2_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,7 @@ def test_load_balancing_loss(self):
config, input_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.num_labels = 3
config.num_experts = 8
config.expert_interval = 2
config.output_router_logits = True
input_ids = input_dict["input_ids"]
attention_mask = input_ids.ne(1).to(torch_device)
Expand Down