From 3e1841b579f569bfeb0786ce99b26ff15c55d9d6 Mon Sep 17 00:00:00 2001 From: Xue Fuzhao <57164838+XueFuzhao@users.noreply.github.com> Date: Wed, 30 Mar 2022 16:31:39 +0800 Subject: [PATCH] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 64ac3ba..b9a2351 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,7 @@ This repo is a collection of AWESOME things about mixture-of-experts, including - SpeechMoE2: Mixture-of-Experts Model with Improved Routing [[23 Nov 2021]](https://arxiv.org/abs/2111.11831) - VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts [[23 Nov 2021]](https://arxiv.org/abs/2111.02358) - Towards More Effective and Economic Sparsely-Activated Model [[14 Oct 2021]](https://arxiv.org/abs/2110.07431) +- M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining [[8 Oct 2021]](https://arxiv.org/abs/2110.03888) - Sparse MoEs meet Efficient Ensembles [[7 Oct 2021]](https://arxiv.org/abs/2110.03360) - MoEfication: Conditional Computation of Transformer Models for Efficient Inference [[5 Oct 2021]](https://arxiv.org/abs/2110.01786) - Cross-token Modeling with Conditional Computation [[5 Sep 2021]](https://arxiv.org/abs/2109.02008)