Matryoshka #3

zgoel317 · 2025-07-16T22:28:38Z

Implemented matryoshka style training. To use, add:

--matryoshka=True
--matryoshka_expansion_factors 8 16 32 \

to your arguments (you can add obviously add different expansion factors). I also select batchtopk as my activation per the source below but it should work regardless:

https://www.lesswrong.com/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes

…hat is what is used in most recent eleuther-AI transcoder upload for llama

…modate matryoshka args

…was causing a type error, so we now clone the original parameter data, temporarily modify it, then restore original data.

…gle slice, not just across the whole encoder onece, also added debug statements for sparsecoder for educational reasons.

…w crosslayer runner - ie we are applying batchtopk+topk to each slice

… of my code

…parse_coder, coalescing only happens in the final slice, jsut like sparsecoder

…middecoder now

…erty.

…an extra forward pass

CLAassistant · 2025-07-16T22:28:48Z

All committers have signed the CLA.

minlu21 and others added 30 commits July 1, 2025 22:09

added set up help and modified package dependency list

fd1e406

file was looking for cfg.json for mlp, changed to config.json since t…

7f22a1e

…hat is what is used in most recent eleuther-AI transcoder upload for llama

torch >= 2.7.0

2b4bc0e

goofy

216ab2c

added matryoshka runner class to runner.py, changed config py to acco…

306a4d2

…modate matryoshka args

forgot bias slicing oopsies

917403a

spelling error boooo

6f21792

more spelling errors :(

85dd716

we were trying to assign tensor slice directly to a parameter, which …

ab9c190

…was causing a type error, so we now clone the original parameter data, temporarily modify it, then restore original data.

hopefully fixed dimension issues with bias slicing

a3ef7e8

the parameter replacement approach broke triton

a697508

the parameter replacement approach broke triton

eaeef0b

many changes made

b45473a

spelling error

c8815c1

annotations

10a1055

added debug output

e6be202

issues with slicing logic

f141b1d

issues with slicing logic

fbfffb2

we have to select activations using topk or whatever before every sin…

122c812

…gle slice, not just across the whole encoder onece, also added debug statements for sparsecoder for educational reasons.

issue with debug statements

823cb5b

issue with pre_acts shape

4b6554a

type error issue

9ac6e86

parameters incorrectly defined for decode with masking

d2f9b93

got rid of check that was defaulting to masking approach

a380677

more dtype error

fc8b201

gradient tracking to mid decoder for matryoshka

db035d4

more gradient tracking

b28723d

trying to fix gradient tracking issues

2f6d9dd

changing the way batchtopk works in matryoshka to more closely align …

59946d1

…w crosslayer runner - ie we are applying batchtopk+topk to each slice

changing the way batchtopk works in matryoshka to more closely align …

fb23831

…w crosslayer runner - ie we are applying batchtopk+topk to each slice

zgoel317 added 22 commits July 1, 2025 22:09

cleaned up most debug statements

058d4a4

cleaned up most debug statements

5245183

cleaned up most debug statements

668d4dd

changed where pre_acts is calcualted so that it doesnt break the rest…

9680474

… of my code

moved loss per slice calculation to sparse_coder

7cadb7c

got rid of any dependencies on pre_acts, all computation happens in s…

4bbbce7

…parse_coder, coalescing only happens in the final slice, jsut like sparsecoder

cosmetic changes

84c1cc5

scrapped everything - all matryoshka related code lives in matryoshka…

48b6a26

…middecoder now

careless error, matryoshka mid decoder wasnt reached

177f606

changed some debug statementst

3c2c5e9

when copy appplied to mid decoder it wasnt preserving matryoshka prop…

1a86cfc

…erty.

more debug statements

bd50e79

merged with clt

047af6d

issues with calculating latent acts for matryoshka mid decoder

5aad2d4

merged from upstream

c1460dd

got rid of the comparison to mid decoder debug statement bc it needs …

6753002

…an extra forward pass

moved logging to wanddb and fixed docstring issues

c7b2087

debug for logging issues

2b6e628

debug for logging issues

bd26ed3

debug for logging issues

895dc11

debug for logging issues

839cdd0

no more debug statements

c1a6d70

zgoel317 and others added 5 commits July 22, 2025 20:25

Merge branch 'clt' into matryoshka

e659fc2

rebase and added vectorized loss calculation

3db4005

rebase and added vectorized loss calculation

068f835

rebase and added vectorized loss calculation

ab5eab5

removed remnant from previous matryoshka implementation

72d836c

zgoel317 force-pushed the matryoshka branch 2 times, most recently from 2ce9948 to 72d836c Compare July 30, 2025 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matryoshka #3

Matryoshka #3

Uh oh!

zgoel317 commented Jul 16, 2025

Uh oh!

CLAassistant commented Jul 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Matryoshka #3

Are you sure you want to change the base?

Matryoshka #3

Uh oh!

Conversation

zgoel317 commented Jul 16, 2025

Uh oh!

CLAassistant commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jul 16, 2025 •

edited

Loading