Add adaptive Muon #62

dmr51 · 2026-01-05T20:17:19Z

Add adaptive Muon with precondition like in AdamW-style. I am Machine Learning Engineer, but not very good in math, so it is needed for a proof that it will work, I wrote this code using ChatGPT. But this variation of adaptive Muon gives 1.5x speedup in convergence in my private benchmark using CIFAR-10 with almost the same accuracy (93.9 vs 94.1).

dmr51 · 2026-01-05T20:55:25Z

I found the paper of AdaMuon https://arxiv.org/abs/2507.11005. It seems close to what I got with ChatGPT, but it does the AdamW step after the orthogonalization (as I understand). As I said, I am not very good in math, but ChatGPT tells that it is better to do AdamW step before the orthogonalization if we do the step elementwise, not layerwise, because if we do it after, it can break the meaning of Muon altogether.

parlance-zz · 2026-01-06T00:20:12Z

I wrote this code using ChatGPT. But this variation of adaptive Muon gives 1.5x speedup in convergence in my private benchmark using CIFAR-10 with almost the same accuracy

I've tried a few adaptive scaling methods, including AdaMuon you linked above and what I found is that if you're seeing a big gain it's only because you didn't use a good learn rate schedule to begin with. Your base learn rate is probably too low and your learn rate decay probably isn't aggressive enough.

Add adaptive Muon

84a7c51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add adaptive Muon #62

Add adaptive Muon #62

Uh oh!

dmr51 commented Jan 5, 2026

Uh oh!

dmr51 commented Jan 5, 2026

Uh oh!

parlance-zz commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add adaptive Muon #62

Are you sure you want to change the base?

Add adaptive Muon #62

Uh oh!

Conversation

dmr51 commented Jan 5, 2026

Uh oh!

dmr51 commented Jan 5, 2026

Uh oh!

parlance-zz commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants