Apple Silicon dual-backend port of autoresearch (PyTorch MPS + MLX) with full Muon optimizer
-
Updated
Mar 23, 2026 - Python
Apple Silicon dual-backend port of autoresearch (PyTorch MPS + MLX) with full Muon optimizer
Unofficial Optimized Implementation for NorMuon
LLM pretraining from scratch on FineWeb dataset (architecture and all components explained), plus optimal use of GPU on SLURM cluster
A performance-optimized Muon optimizer implementation for PyTorch
Megatron-LM fork for experiments on Alps
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks (ICLR 2026)
High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).
OpenAI parameter-golf challenge: train the smallest LM that fits in 16MB. My SP8192 frontier submission (openai/parameter-golf#1887): 11L x 512d w/ targeted middle recurrence, MuonEq-R optimizer, int6 GPTQ+SDClip, Brotli-11 compression, legal score-first TTT. 8xH100 DDP.
ARS2C-AGA: Slide directly with the geodesic of the loss landscape to the global optimum.
Few-Shot Adaptation for Vision-Language Models. Implements Base-to-Novel generalization on CLIP using LoRA, LP++, and Muon Optimizer to enhance performance on the Oxford Flowers-102 dataset.
Add a description, image, and links to the muon-optimizer topic page so that developers can more easily learn about it.
To associate your repository with the muon-optimizer topic, visit your repo's landing page and select "manage topics."