Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
XueFuzhao authored Mar 30, 2022
1 parent be7ddb6 commit 5b97e51
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,32 @@ This repo is a collection of AWESOME things about mixture-of-experts, including
- [Papers](#papers)
- [MoE Model](#moe-model)
- [MoE System](#moe-system)

- [Library](#library)

# Papers
## MoE Model
**Conference**
- Go Wider Instead of Deeper [[AAAI2022]](https://arxiv.org/abs/2107.11817)
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [[ICLR2017]](https://openreview.net/forum?id=B1ckMDqlg)

**Arxiv**
- One Student Knows All Experts Know: From Sparse to Dense [[26 Jan 2022]](https://arxiv.org/abs/2201.10890)
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [[13 Dec 2021]](https://arxiv.org/abs/2112.06905)
- MoEfication: Conditional Computation of Transformer Models for Efficient Inference [[5 Oct 2021]](https://arxiv.org/abs/2110.01786)
- Cross-token Modeling with Conditional Computation [[5 Sep 2021]](https://arxiv.org/abs/2109.02008)
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [[11 Jan 2021]](https://arxiv.org/abs/2101.03961)
-
- Exploring Routing Strategies for Multilingual Mixture-of-Experts Models [[28 Sept 2020]](https://arxiv.org/abs/2101.03961)

## MoE System
**Conference**
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding [[ICLR2021]](https://openreview.net/forum?id=qrwe7XHTmYb)

**Arxiv**
- DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale [[14 Jan 2022]](https://arxiv.org/abs/2201.05596)
- FastMoE: A Fast Mixture-of-Expert Training System [[24 Mar 2021]](https://arxiv.org/abs/2103.13262)

# Library
- [Tutel: An efficient mixture-of-experts implementation for large DNN model training](https://github.com/microsoft/tutel)
- [Mesh-TensorFlow: Deep Learning for Supercomputers](https://github.com/tensorflow/mesh)
- [FastMoE: A Fast Mixture-of-Expert Training System](https://github.com/laekov/fastmoe)
- [DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale](https://github.com/microsoft/DeepSpeed)

0 comments on commit 5b97e51

Please sign in to comment.