- ./counting- train.pytrain the character counting model, as well as make up the data
- intervention.pyactivation patching experiments
- visualize.ipynbresults visualization code
 
- ./ioi- make_decoder_train_data.pycontains code to make up the data
- DLA.pyimplements the DLA experiments
 
- ./addition- train.pytrain the 3 digit addition model, as well as make up the data
- intervention.pymain activation patching experiments
- interventionPlus.pyactivation patching experiments for the "+" sign
- visualize.ipynbresults visualization code
 
- ./factual- find_heads_attribution.pyfind 25 most important heads in upper layers
- make_data_part1.pyselect text from COUNTERFACT and BEAR that would "activate" each head (do not attend to BOS too much) at the END position
- make_data_part2.pyselect text from miniPile that would "activate" each head
- cal_freq.pycalculate token frequency over miniPile
 
- ./decoder- model.pydefines the model architecture
- train.pytrain the decoder
- cache_generation.pygenerate samples using decoder, but not in a visualized form, need to be transferred to streamlit app
- run.shcommands to train decoder and generate samples
- utils.py- generate.pyfunctions used by other files
- cache_attention.pyused to save attention patterns
- scatter_completeness.py- scatter_completeness_plot.pydraw scatter plots to verify the completeness
 
- ./training_outputscontains the model checkpoint of the probed model for counting and addition task, so the results are reproduceable
- ./LLMcontains prompts and code used to automatically generate interpretation with LLMs
- ./webAPPcontains source code for our web application
- Go to ./ioiand runmake_decoder_train_data.pyto generate data for ioi task. You don't need to do this for counting and addition task. To run factual recall experiment, first download COUNTERFACT and BEAR data (the links are in./factual/make_data_part1.py), then go to./factualand runmake_data_part1.pyandmake_data_part2.pysequentially.
- Go to ./decoderand checkrun.shpick a task you are interested and train the decoder. For examplepython train.py --probed_task counting --rebalance 6.0 --save_dir $dir_name --batch_size 256 --num_epoch 100 --data_per_epoch 1000000 --num_test_rollout 200 > ./data_and_model/counting.txt
- In run.shit also contains command for generating preimage samples using decoder. For example,python cache_generation.py --probed_task countingand the generation will appear in./training_outputsThe best way to check the generated samples is to go into./webAPPfolder and dostreamlit run InversionView.py