Skip to content
/ PTST Public

Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"

License

Notifications You must be signed in to change notification settings

vfleaking/PTST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTST

Code for the safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates" (https://arxiv.org/abs/2402.18540)

Code for GPT Using OpenAI API

Fine-tuning

Go to the folder gpt-api and see run-gpt-gsm.sh for an example shell script to fine-tune gpt-3.5-turbo-0613 on GSM8K.

  • The code will automatically output the ids of the fine-tuning job and the fine-tuned model, and log them to WandB.
  • You can also view the training curves when the training ends on WandB.
  • See gpt-api/prompt_utils.py for all prompt templates.

Inference

Coming soon!

Code for Llama

Fine-tuning

The code for llama-2 finetuning is under the llama2 folder. To finetune on the ChatDoctor dataset, please

  1. Download the dataset here
  2. Move the json file to llama2/medical_dataset/.
  3. Fill in your wandb project name and user name in lines 57 and 58 in finetuning.py.
  4. Run train-chatdoctor-lora.sh under the llama2 folder.

Inference

inference.py is a variant of Llama's inference code but with multi-gpu support.

python inference.py \
    <path-to-model>
    --peft_model <path-to-peft> \
    --prompt_file vfleaking/DirectHarm4 \
    --prompt_template_style gsm:chat:llama \
    --output <output-file> \
    --top_p 0 --freq 8
  • prompt_file: can be vfleaking/DirectHarm4, https://huggingface.co/datasets/vfleaking/GSM-Danger or data/advbench-harmful-behaviors.csv
  • prompt_template_style: See prompt_utils.py for possible options.
  • freq: the batch size

Safety Test

gpt4_eval.py is a multi-thread variant of gpt4_eval.py from Qi et al. (2023). Please set your OpenAI API key before running the evaluation command:

python safety_evaluation/gpt4_eval.py --input_file question_output/example.jsonl
  • input_file: a jsonl file with each line containing the input prompt and the model response.
  • The output of the GPT-4 judge will be saved under safety_evaluation/gpt4_eval_output.

Citation Information

@article{lyu2024keeping,
  title={Keeping {LLMs} Aligned After Fine-tuning: The Crucial Role of Prompt Templates},
  author={Kaifeng Lyu and Haoyu Zhao and Xinran Gu and Dingli Yu and Anirudh Goyal and Sanjeev Arora},
  journal={arXiv preprint arXiv:2402.18540},
  year={2024}
}

About

Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •