PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
-
Updated
Apr 19, 2024 - Python
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
[ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Add a description, image, and links to the sparsely-gated-mixture-of-experts topic page so that developers can more easily learn about it.
To associate your repository with the sparsely-gated-mixture-of-experts topic, visit your repo's landing page and select "manage topics."