Skip to content

Conversation

@zgoel317
Copy link

Implemented matryoshka style training. To use, add:

--matryoshka=True
--matryoshka_expansion_factors 8 16 32 \

to your arguments (you can add obviously add different expansion factors). I also select batchtopk as my activation per the source below but it should work regardless:

https://www.lesswrong.com/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes

minlu21 and others added 30 commits July 1, 2025 22:09
…hat is what is used in most recent eleuther-AI transcoder upload for llama
…was causing a type error, so we now clone the original parameter data, temporarily modify it, then restore original data.
…gle slice, not just across the whole encoder onece, also added debug statements for sparsecoder for educational reasons.
…w crosslayer runner - ie we are applying batchtopk+topk to each slice
…w crosslayer runner - ie we are applying batchtopk+topk to each slice
@CLAassistant
Copy link

CLAassistant commented Jul 16, 2025

CLA assistant check
All committers have signed the CLA.

@zgoel317 zgoel317 force-pushed the matryoshka branch 2 times, most recently from 2ce9948 to 72d836c Compare July 30, 2025 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants