Skip to content

efarrell1/unlearning_with_saes

Repository files navigation

Unlearning with SAEs

This is the code we used to generate the results in 'Applying sparse autoencoders to unlearn knowledge in language models'. The `unlearning' folder contains the useful functions.

About

Code for Unlearning with SAEs paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published