Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2MoE #29377

Merged
merged 46 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
4f933bb
add support for qwen2 MoE models
Feb 28, 2024
8ad6c9e
update docs
Feb 28, 2024
fbce3b9
add support for qwen2 MoE models
Feb 28, 2024
c32b998
update docs
Feb 28, 2024
8274f89
Merge branch 'qwen2_moe' of https://github.com/bozheng-hit/transforme…
Feb 28, 2024
e44f700
update model name & test
Feb 29, 2024
b09e2ed
update readme
Feb 29, 2024
d5e99a6
update class names & readme & model_doc of Qwen2MoE.
Feb 29, 2024
1625b1f
update architecture name
Feb 29, 2024
051e19d
fix qwen2_moe tests
Feb 29, 2024
307d9de
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
4d80bf8
update modeling_qwen2_moe.py
Mar 1, 2024
8b6d57b
fix model architecture
Mar 9, 2024
b9c2803
fix qwen2_moe tests
Feb 29, 2024
f8e1819
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
e4b8445
update modeling_qwen2_moe.py
Mar 1, 2024
8d74bb0
fix model architecture
Mar 9, 2024
a50a208
fix style
Mar 10, 2024
a04c698
fix test when there are sparse and non sparse layers
Mar 10, 2024
dc53a8d
fixup
Mar 21, 2024
8f55aa5
Update README.md
bozheng-hit Mar 21, 2024
6a06f8e
fix up
Mar 21, 2024
bf11227
fixup
Mar 22, 2024
e3038db
fixup
Mar 23, 2024
5c627d3
add archive back
Mar 23, 2024
765ebf5
add support for qwen2 MoE models
Feb 28, 2024
1c973fb
update docs
Feb 28, 2024
0841722
update model name & test
Feb 29, 2024
4c0b2b1
update readme
Feb 29, 2024
8958743
update class names & readme & model_doc of Qwen2MoE.
Feb 29, 2024
1e099c5
update architecture name
Feb 29, 2024
4906cdf
fix qwen2_moe tests
Feb 29, 2024
82729ec
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
a3aa52d
update modeling_qwen2_moe.py
Mar 1, 2024
0686cc6
fix model architecture
Mar 9, 2024
c074021
fixup
Mar 21, 2024
2484604
fix qwen2_moe tests
Feb 29, 2024
5d1ed37
use Qwen2Tokenizer instead of Qwen2MoeTokenizer
Mar 1, 2024
27afcd5
fix style
Mar 10, 2024
0d155e9
fix test when there are sparse and non sparse layers
Mar 10, 2024
46b0918
fixup
Mar 23, 2024
45219a1
add archive back
Mar 23, 2024
cf61e7e
fixup
Mar 25, 2024
3b9f3a8
fix integration test
Mar 26, 2024
4077877
fixup
Mar 26, 2024
4d931f0
Merge branch 'main' into qwen2_moe
bozheng-hit Mar 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fixup
  • Loading branch information
bozheng-hit committed Mar 23, 2024
commit e3038db85926925755079df1dc54d2b21eac02fc
8 changes: 0 additions & 8 deletions src/transformers/models/qwen2_moe/configuration_qwen2_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,6 @@

logger = logging.get_logger(__name__)

QWEN2MOE_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"Qwen/Qwen1.5-MoE-A2.7B": "https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/resolve/main/config.json",
}


class Qwen2MoeConfig(PretrainedConfig):
r"""
Expand Down Expand Up @@ -151,10 +147,6 @@ def __init__(
self.sliding_window = sliding_window
self.max_window_layers = max_window_layers

# for backward compatibility
if num_key_value_heads is None:
num_key_value_heads = num_attention_heads

self.num_key_value_heads = num_key_value_heads
self.hidden_act = hidden_act
self.initializer_range = initializer_range
Expand Down
6 changes: 1 addition & 5 deletions src/transformers/models/qwen2_moe/modeling_qwen2_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,6 @@
_CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
_CONFIG_FOR_DOC = "Qwen2MoeConfig"

QWEN2MOE_PRETRAINED_MODEL_ARCHIVE_LIST = [
"Qwen/Qwen1.5-MoE-A2.7B",
# See all Qwen2 models at https://huggingface.co/models?filter=qwen2
]


# Copied from transformers.models.mixtral.modeling_mixtral.load_balancing_loss_func
def load_balancing_loss_func(
Expand Down Expand Up @@ -1492,6 +1487,7 @@ def _reorder_cache(past_key_values, beam_idx):
""",
QWEN2MOE_START_DOCSTRING,
)
# Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
class Qwen2MoeForSequenceClassification(Qwen2MoePreTrainedModel):
bozheng-hit marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, config):
super().__init__(config)
Expand Down