Official Code for S&P 25 paper "Alleviating the Fear of Losing Alignment in LLM Fine-tuning"
A framework for evaluating and improving language model alignment through fine-tuning and parameter recovery techniques.
This project provides tools for:
- Fine-tuning language models with alignment objectives
- Recovering model parameters after harmful fine-tuning
- Evaluating model performance and safety
- Supporting multiple LLM architectures (Llama2, Gemma, Mistral, Qwen)
- Supports LoRA-based fine-tuning
- Handles both benign and harmful training data
- Configurable training parameters
- Supports multiple LLM architectures
- Implements gradient-guided parameter recovery
- Supports multi-GPU training
- Features warmup steps and rollback mechanisms
- Configurable recovery rates and thresholds
- Measures model performance on various tasks
- Evaluates model safety and harmful behaviors
- Supports multiple evaluation datasets
- Tracks metrics across recovery steps
- Analyzes experimental results
- Processes metrics across different models and tasks
- Generates comparative analysis
- Llama2 (7B, 13B)
- Gemma 2B
- Mistral v2 7B
- Qwen 7B
- You can add more.
- SQL
- Cheat detection
- NL2Bash conversion
- Text summarization
- Toxicity detection
- Clone the repository:
git clone https://github.com/kangyangWHU/LLMAlignment.git
cd LLMAlignment- Install dependencies:
conda create -n myenv python=3.9
# Step 2: Activate the environment
conda activate myenv
# install pytorch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
# Step 3: Install requirements via pip
pip install -r requirements.txtIf you use Gemma serires, you need also install FlashAttention, which requires cuda > 12.0.
pip install flash-attn==2.6.1python run_finetune_exp.pypython run_recover_exp.pypython run_eval_exp.pypython run_res.pyLLMAlignment/
├── run_finetune_exp.py # Fine-tuning experiments
├── sgdg_rollback_final.py # Parameter recovery implementation
├── run_recover_exp.py # run parameter recovery experiments
├── run_eval_exp.py # Evaluation pipeline
├── run_res.py # Results analysis
├── utils/ # Utility functions
│ ├── constant.py # Constants and mappings
│ ├── inference_utils.py # Inference helpers
│ ├── lora_utils.py # LoRA utilities
│ └── res_utils.py # Results processing
├── dataset/ # datasets
└── cfg/ # Configuration files
-
Multi-GPU Support
- Distributed training and evaluation
- Efficient parameter recovery across multiple GPUs
-
Flexible Evaluation
- Support for multiple tasks
- Customizable evaluation metrics
- Safety evaluation
-
Parameter Recovery
- Gradient-guided recovery
- Configurable recovery strategies
- Progress tracking and checkpointing
-
Modular Design
- Easy to extend to new models
- Configurable components
- Reusable utilities
If you use this code in your research, please cite:
@INPROCEEDINGS {,
author = { Yang, Kang and Tao, Guanhong and Chen, Xun and Xu, Jun },
booktitle = { 2025 IEEE Symposium on Security and Privacy (SP) },
title = {{ Alleviating the Fear of Losing Alignment in LLM Fine-tuning }},
year = {2025},
volume = {},
ISSN = {2375-1207},
pages = {2004-2022},
keywords = {},
doi = {10.1109/SP61157.2025.00171},
url = {https://doi.ieeecomputersociety.org/10.1109/SP61157.2025.00171},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month =May}