FALCON: Fine-grained activation manipulation by contrastive orthogonal unalignment for large language model

Official implementation of "FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model" accepted at NeurIPS 2025. The details are available at 📄 Paper.

🎯 Overview

FALCON is a representation-guided unlearning framework for Large Language Models (LLMs) that addresses the critical challenge of selectively removing undesired knowledge while preserving model utility.

Schematic overview of FALCON. The pipeline comprises three stages: parameter selection based on mutual information (Step 1); contrastive orthogonal unalignment, which consists of contrastive mechanism on both forgetting and retention datasets (Step 2.1) and orthogonal gradient conflict resolution (Step 2.2); and model unlearning guided by these components (Step 3).

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/CharlesJW222/FALCON.git
cd FALCON

# Create conda environment from yaml file
conda env create -f environment.yaml
conda activate falcon

# Install dependencies
pip install -r requirements.txt

🔧 Usage

Step 1: Mutual Information Analysis

Identify optimal layers for unlearning using information-theoretic guidance:

# TOFU dataset analysis
python run_MI.py \
    --dataset tofu \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --forget-config forget10 \
    --retain-config retain90 \
    --ratio 1.0 \
    --all-layers

Step 2: Execute Unlearning

📖 Interactive Tutorial

We provide a Jupyter notebook for hands-on exploration about unlearning:

jupyter notebook quick_start.ipynb

📝 Citation

If you find our paper's idea useful in your research, please cite:

@inproceedings{
  hu2025falcon,
  title={{FALCON}: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model},
  author={Jinwei Hu and Zhenglin Huang and Xiangyu Yin and Wenjie Ruan and Guangliang Cheng and Yi Dong and Xiaowei Huang},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=BDKkFwskot}
}

⚠️ Responsible AI Statement

FALCON is designed for responsible AI deployment. While our method enables selective knowledge removal from LLMs, users must ensure:

Compliance with applicable regulations (e.g., GDPR's "right to be forgotten")
Ethical considerations in determining what knowledge should be unlearned
Transparency in communicating model capabilities and limitations
Ongoing monitoring for unintended consequences

The authors do not endorse using this technology for malicious purposes or circumventing legitimate safety mechanisms.

🔗 Related Projects and Datasets

⭐ Star us on GitHub if you find FALCON useful!

Made with ❤️ by the University of Liverpool TACPS Lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FALCON: Fine-grained activation manipulation by contrastive orthogonal unalignment for large language model

🎯 Overview

🚀 Quick Start

Installation

🔧 Usage

Step 1: Mutual Information Analysis

Step 2: Execute Unlearning

📖 Interactive Tutorial

📝 Citation

⚠️ Responsible AI Statement

🔗 Related Projects and Datasets

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
MI		MI
assets		assets
data		data
falcon		falcon
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
quick_start.ipynb		quick_start.ipynb
requirements.txt		requirements.txt

License

CharlesJW222/FALCON

Folders and files

Latest commit

History

Repository files navigation

FALCON: Fine-grained activation manipulation by contrastive orthogonal unalignment for large language model

🎯 Overview

🚀 Quick Start

Installation

🔧 Usage

Step 1: Mutual Information Analysis

Step 2: Execute Unlearning

📖 Interactive Tutorial

📝 Citation

⚠️ Responsible AI Statement

🔗 Related Projects and Datasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages