Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
-
Updated
Sep 26, 2025 - Python
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
Dynamic weight generation for recursive transformers via input-conditioned LoRA modulation
(NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"
Temporalmesh-transformer. It is the first architecture to simultaneously fuse dynamic graph topology, token-level adaptive compute, and temporal semantic decay into a single unified model. No prior work does all three together.
Volumetric language model with Triangle Cross-Scan State Modelling. Without Attention. With Neural Turing Machines (NTM) & Differentiable Neural Computers (DNC) smells
PyTorch benchmark for CTM-style adaptive computation, sparse-retrieval failure analysis, adaptive halting, and attention-supervised recovery.
Lightweight PyTorch implementation of Mixture-of-Recursions with Expert-Choice & Token-Choice routing | Runs on your laptop!
Add a description, image, and links to the adaptive-computation topic page so that developers can more easily learn about it.
To associate your repository with the adaptive-computation topic, visit your repo's landing page and select "manage topics."