./counting
train.py
train the character counting model, as well as make up the dataintervention.py
activation patching experimentsvisualize.ipynb
results visualization code
./ioi
make_decoder_train_data.py
contains code to make up the dataDLA.py
implements the DLA experiments
./addition
train.py
train the 3 digit addition model, as well as make up the dataintervention.py
main activation patching experimentsinterventionPlus.py
activation patching experiments for the "+" signvisualize.ipynb
results visualization code
./factual
find_heads_attribution.py
find 25 most important heads in upper layersmake_data_part1.py
select text from COUNTERFACT and BEAR that would "activate" each head (do not attend to BOS too much) at the END positionmake_data_part2.py
select text from miniPile that would "activate" each headcal_freq.py
calculate token frequency over miniPile
./decoder
model.py
defines the model architecturetrain.py
train the decodercache_generation.py
generate samples using decoder, but not in a visualized form, need to be transferred to streamlit apprun.sh
commands to train decoder and generate samplesutils.py
generate.py
functions used by other filescache_attention.py
used to save attention patternsscatter_completeness.py
scatter_completeness_plot.py
draw scatter plots to verify the completeness
./training_outputs
contains the model checkpoint of the probed model for counting and addition task, so the results are reproduceable./LLM
contains prompts and code used to automatically generate interpretation with LLMs./webAPP
contains source code for our web application
- Go to
./ioi
and runmake_decoder_train_data.py
to generate data for ioi task. You don't need to do this for counting and addition task. To run factual recall experiment, first download COUNTERFACT and BEAR data (the links are in./factual/make_data_part1.py
), then go to./factual
and runmake_data_part1.py
andmake_data_part2.py
sequentially. - Go to
./decoder
and checkrun.sh
pick a task you are interested and train the decoder. For examplepython train.py --probed_task counting --rebalance 6.0 --save_dir $dir_name --batch_size 256 --num_epoch 100 --data_per_epoch 1000000 --num_test_rollout 200 > ./data_and_model/counting.txt
- In
run.sh
it also contains command for generating preimage samples using decoder. For example,python cache_generation.py --probed_task counting
and the generation will appear in./training_outputs
The best way to check the generated samples is to go into./webAPP
folder and dostreamlit run InversionView.py