[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
-
Updated
Feb 7, 2025 - Python
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
(NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"
Add a description, image, and links to the adaptive-computation topic page so that developers can more easily learn about it.
To associate your repository with the adaptive-computation topic, visit your repo's landing page and select "manage topics."