Update README.md

Vvsmile · Mar 30, 2022 · 5b97e51 · 5b97e51
1 parent be7ddb6
commit 5b97e51
Showing 1 changed file with 12 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -12,22 +12,32 @@ This repo is a collection of AWESOME things about mixture-of-experts, including
 - [Papers](#papers)
   - [MoE Model](#moe-model)
   - [MoE System](#moe-system)
-
+- [Library](#library)
 
 # Papers
 ## MoE Model
 **Conference**
 - Go Wider Instead of Deeper [[AAAI2022]](https://arxiv.org/abs/2107.11817)
+- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [[ICLR2017]](https://openreview.net/forum?id=B1ckMDqlg)
 
 **Arxiv**
+- One Student Knows All Experts Know: From Sparse to Dense [[26 Jan 2022]](https://arxiv.org/abs/2201.10890)
+- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [[13 Dec 2021]](https://arxiv.org/abs/2112.06905)
 - MoEfication: Conditional Computation of Transformer Models for Efficient Inference [[5 Oct 2021]](https://arxiv.org/abs/2110.01786)
 - Cross-token Modeling with Conditional Computation [[5 Sep 2021]](https://arxiv.org/abs/2109.02008)
 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [[11 Jan 2021]](https://arxiv.org/abs/2101.03961)
-- 
+- Exploring Routing Strategies for Multilingual Mixture-of-Experts Models [[28 Sept 2020]](https://arxiv.org/abs/2101.03961)
+
 ## MoE System
 **Conference**
 - GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding [[ICLR2021]](https://openreview.net/forum?id=qrwe7XHTmYb)
 
 **Arxiv**
 - DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale [[14 Jan 2022]](https://arxiv.org/abs/2201.05596)
 - FastMoE: A Fast Mixture-of-Expert Training System [[24 Mar 2021]](https://arxiv.org/abs/2103.13262)
+
+# Library
+- [Tutel: An efficient mixture-of-experts implementation for large DNN model training](https://github.com/microsoft/tutel)
+- [Mesh-TensorFlow: Deep Learning for Supercomputers](https://github.com/tensorflow/mesh)
+- [FastMoE: A Fast Mixture-of-Expert Training System](https://github.com/laekov/fastmoe)
+- [DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale](https://github.com/microsoft/DeepSpeed)