Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add moe topk(k>2) gate support #5881

Merged
merged 8 commits into from
Aug 15, 2024
Merged

add moe topk(k>2) gate support #5881

merged 8 commits into from
Aug 15, 2024

Conversation

inkcherry
Copy link
Contributor

@inkcherry inkcherry commented Aug 8, 2024

Notice some users need to use topk > 2 to train MoE models. For example: https://huggingface.co/Qwen/Qwen2-57B-A14B/blob/main/config.json, this PR adds support for topk (k > 2) gates.

  • add topk (k>2) support
  • add drop token policy based on position and probabilities.
  • unit tests

inkcherry and others added 3 commits August 8, 2024 08:09
* [MoE] enable topk > 2 gate

* print_version

* refine code

* deepspeed/moe/sharded_moe.py

* func verify

* refine code

* refine code

* refine code

* refine code

* refine code

* remove duplicate topk

* update

* refine code

* fix format

* update

* fix ==

* update

* add ut

* rm tt

* update

* add top3 ut

* revert note

* remove -

---------

Co-authored-by: Kurt Chen <kurt.chen@intel.com>
Co-authored-by: Jin, Youzhi <youzhi.jin@intel.com>
@tjruwase tjruwase requested review from tohtana and removed request for awan-10 and loadams August 9, 2024 10:41
@tohtana tohtana enabled auto-merge August 15, 2024 16:08
@tohtana
Copy link
Contributor

tohtana commented Aug 15, 2024

Thank you @inkcherry for the great contribution! I approved and scheduled merging.

@tohtana tohtana added this pull request to the merge queue Aug 15, 2024
Merged via the queue into microsoft:master with commit 9a3ede7 Aug 15, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants