sparse_circuit_discovery is a clean PyTorch reimplementation, for GPT-2-small, of the Marks et al. (2024) unsupervised circuit discovery algorithm. It also contains a simpler "naive ablations" algorithm for circuit discovery; some results from the ablations algorithm are presented here and here.
🧫
Alignment science
Pinned Loading
-
sparse_circuit_discovery
sparse_circuit_discovery PublicCircuit discovery in GPT-2 small, using sparse autoencoding
-
-
montemac/activation_additions
montemac/activation_additions PublicAlgebraic value editing in pretrained language models
-
feature-circuits
feature-circuits PublicForked from JacksonKaunismaa/feature-circuits
Discovering and editing interpretable causal graphs in language models
Python 2
-
-
simbrain/NeuralNetworksCogSciBook
simbrain/NeuralNetworksCogSciBook PublicNeural Networks in Cognitive Science
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.


