FALCON: Fine-grained activation manipulation by contrastive orthogonal unalignment for large language model
Official implementation of "FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model" accepted at NeurIPS 2025. The details are available at 📄 Paper.
FALCON is a representation-guided unlearning framework for Large Language Models (LLMs) that addresses the critical challenge of selectively removing undesired knowledge while preserving model utility.
Schematic overview of FALCON. The pipeline comprises three stages: parameter selection based on mutual information (Step 1); contrastive orthogonal unalignment, which consists of contrastive mechanism on both forgetting and retention datasets (Step 2.1) and orthogonal gradient conflict resolution (Step 2.2); and model unlearning guided by these components (Step 3).
# Clone the repository
git clone https://github.com/CharlesJW222/FALCON.git
cd FALCON
# Create conda environment from yaml file
conda env create -f environment.yaml
conda activate falcon
# Install dependencies
pip install -r requirements.txtIdentify optimal layers for unlearning using information-theoretic guidance:
# TOFU dataset analysis
python run_MI.py \
--dataset tofu \
--model meta-llama/Llama-3.2-1B-Instruct \
--forget-config forget10 \
--retain-config retain90 \
--ratio 1.0 \
--all-layers
We provide a Jupyter notebook for hands-on exploration about unlearning:
jupyter notebook quick_start.ipynbIf you find our paper's idea useful in your research, please cite:
@inproceedings{
hu2025falcon,
title={{FALCON}: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model},
author={Jinwei Hu and Zhenglin Huang and Xiangyu Yin and Wenjie Ruan and Guangliang Cheng and Yi Dong and Xiaowei Huang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=BDKkFwskot}
}FALCON is designed for responsible AI deployment. While our method enables selective knowledge removal from LLMs, users must ensure:
- Compliance with applicable regulations (e.g., GDPR's "right to be forgotten")
- Ethical considerations in determining what knowledge should be unlearned
- Transparency in communicating model capabilities and limitations
- Ongoing monitoring for unintended consequences
The authors do not endorse using this technology for malicious purposes or circumventing legitimate safety mechanisms.
- WMDP: Weapons of Mass Destruction Proxy
- TOFU: Task of Fictitious Unlearning
- MUSE: Machine Unlearning Six-Way Evaluation
⭐ Star us on GitHub if you find FALCON useful!
Made with ❤️ by the University of Liverpool TACPS Lab