Implementation for our ICML 2025 paper Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning.
VAA is a safety alignment method that upweights and reinforces vulnerable data to enhance safety retention during customized fine-tuning.
Create the Conda environment using the provided environment.yml:
conda env create -f environment.ymlIf you encounter missing dependencies, please install them manually.
Prepare experimental datasets using:
bash script/build_dataset.shModels should be downloaded and placed in the paths as specified in scripts.
Run the following scripts to perform safety alignment with VAA and baselines:
# model = qwen | llama
bash script/vaa_pipeline_{model}.sh # Supports ERM and vaccine
bash script/repnoise_{model}.sh
bash script/booster_{model}.shRun the following script to perform HFT and evaluate the model on various downstream datasets.
These steps are also integrated into the main training pipeline.
bash script/run_all_downstream.shIf you find our work helpful, please cite:
@misc{chen2025VAA,
title={Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning},
author={Liang Chen and Xueting Han and Li Shen and Jing Bai and Kam-Fai Wong},
year={2025},
eprint={2506.03850},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.03850},
}This implementation is partially builds upon vaccine framework. We thank the authors for releasing their codebase.
@article{huang2024vaccine,
title={Vaccine: Perturbation-aware alignment for large language model},
author={Huang, Tiansheng and Hu, Sihao and Liu, Ling},
journal={arXiv preprint arXiv:2402.01109},
year={2024}
}