Paper: DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for Llms
To set up the environment for the project, create a conda environment using the following command:
$ conda create --name torch-env pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
$ conda activate torch-env
Then, install the following libraries:
pip install datasets accelerate evaluate matplotlib hydra-core omegaconf peft rouge_score tqdm einops packaging bitsandbytes scipy ninja
Also you may install additional libraries if required.
To perform traditional retraining from scratch, run the following command:
python finetune.py --config-path /home/user_name/project_name/config --config-name finetune.yaml
Do necessary modification in finetune.yaml file based on your hardware and GPU capacity.
To train a disclosure-protected base model for unlearning, use one of the following commands:
python Train_dp_MLM.py --config-path /home/user_name/project_name/config --config-name Train_dp_MLM.yaml
or
python Train_dp_SGD.py --config-path /home/user_name/project_name/config --config-name Train_dp_SGD.yaml
Do necessary modification in Train_dp_MLM.yaml or Train_dp_SGD.yaml based on your hardware and GPU capacity.
For DP2Unlearning fine-tuning, run:
python FT_BaseModel.py --config-path /home/user_name/project_name/config --config-name FT_BaseModel.yaml
Do necessary modification to FT_BaseModel.yaml based on forgetting percentage (1%:retain99, 5%:retain95, or 10%:retain90)
To perform approximate unlearning fine-tuning, execute the following:
python forget.py --config-path /home/user_name/project_name/config --config-name forget.yaml
To evaluate the models, use this command:
python evaluate_util.py --config-path /home/user_name/project_name/config --config-name eval_everything.yaml
You need to provide the specific model path that you wish to evaluate.
To aggregate the evaluation statistics, use:
python aggregate_eval_stat.py --config-path /home/user_name/project_name/config --config-name aggregate_eval_stat.yaml
Ensure you have the paths to your results:
retain_result=${path_to_traditional_retraining_from_scratch}
ckpt_result=${path_to_your_unlearned_method}
To run the Beyond KS Test, execute:
python Beyond_KS_test.py --config-path /home/user_name/project_name/config --config-name aggregate_eval_stat.yaml
The baseline methods are implemented from [1]