Vulnerability-Aware Alignment:
Mitigating Uneven Forgetting in Harmful Fine-Tuning

Implementation for our ICML 2025 paper Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning.

VAA is a safety alignment method that upweights and reinforces vulnerable data to enhance safety retention during customized fine-tuning.

Getting Started

1. Environment Setup

Create the Conda environment using the provided environment.yml:

conda env create -f environment.yml

If you encounter missing dependencies, please install them manually.

2. Data and Resources

Prepare experimental datasets using:

bash script/build_dataset.sh

Models should be downloaded and placed in the paths as specified in scripts.

3. Safety Alignment

Run the following scripts to perform safety alignment with VAA and baselines:

# model = qwen | llama
bash script/vaa_pipeline_{model}.sh # Supports ERM and vaccine
bash script/repnoise_{model}.sh
bash script/booster_{model}.sh

4. Harmful Fine-Tuning & Evaluation

Run the following script to perform HFT and evaluate the model on various downstream datasets.

These steps are also integrated into the main training pipeline.

bash script/run_all_downstream.sh

Citation

If you find our work helpful, please cite:

@misc{chen2025VAA,
      title={Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning}, 
      author={Liang Chen and Xueting Han and Li Shen and Jing Bai and Kam-Fai Wong},
      year={2025},
      eprint={2506.03850},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.03850}, 
}

This implementation is partially builds upon vaccine framework. We thank the authors for releasing their codebase.

@article{huang2024vaccine,
  title={Vaccine: Perturbation-aware alignment for large language model},
  author={Huang, Tiansheng and Hu, Sihao and Liu, Ling},
  journal={arXiv preprint arXiv:2402.01109},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
ag_news		ag_news
alpaca		alpaca
beavertails		beavertails
grouping		grouping
gsm8k		gsm8k
loss_func		loss_func
poison/evaluation		poison/evaluation
script		script
sst2		sst2
README.md		README.md
callbacks.py		callbacks.py
environment.yml		environment.yml
eval_benign.py		eval_benign.py
lm_dataloader.py		lm_dataloader.py
lm_model.py		lm_model.py
loss.py		loss.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Vulnerability-Aware Alignment:
Mitigating Uneven Forgetting in Harmful Fine-Tuning

Getting Started

1. Environment Setup

2. Data and Resources

3. Safety Alignment

4. Harmful Fine-Tuning & Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

ChanLiang/VAA

Folders and files

Latest commit

History

Repository files navigation

Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning

Getting Started

1. Environment Setup

2. Data and Resources

3. Safety Alignment

4. Harmful Fine-Tuning & Evaluation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Vulnerability-Aware Alignment:
Mitigating Uneven Forgetting in Harmful Fine-Tuning

Packages