Prompt Leakage effect and defense strategies for multi-turn LLM interactions

This repository contains the data and code for the experiments in the PromptLeakage EMNLP 2024 paper.

Figure: Our standardized task setup for evaluating LLM vulnerability against multi-turn prompt leakage

Abstract

Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts. We present different combination of defenses against our threat model, including a cost analysis. Our study highlights key takeaways for building secure LLM applications and provides directions for research in multi-turn LLM interactions

Accessing the Data

We publicly release the datasets used for each of the 4 domains - news, legal, finance and medical, in the data/ directory. The finetuning data used can be found at safetyfinetuning/prompt-leakage-finetune.csv

Running evaluations with different LLMs

Export API keys for black box LLMs

export ANTHROPIC_API_KEY=... export COHERE_API_KEY=... export GOOGLE_APPLICATION_CREDENTIALS=... export OPENAI_API_KEY=...

Initialize ollama for open source LLMs

bash ollama_init.sh

Running threat model for both experiment settings

python run_inference_no_defenses.py python run_inference_all_defenses.py

Citation

@article{agarwal2024promptleakageeffectdefense,
      title={Prompt Leakage effect and defense strategies for multi-turn LLM interactions}, 
      author={Divyansh Agarwal and Alexander R. Fabbri and Ben Risher and Philippe Laban and Shafiq Joty and Chien-Sheng Wu},
      year={2024},
      eprint={2404.16251},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2404.16251}, 
}

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
prompts		prompts
safetyfinetuning		safetyfinetuning
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
figure-prompt-leakage.jpg		figure-prompt-leakage.jpg
gpt4_leakage_judge.py		gpt4_leakage_judge.py
model_anthropic.py		model_anthropic.py
model_cohere.py		model_cohere.py
model_google.py		model_google.py
model_ollama.py		model_ollama.py
model_openai_wrapper.py		model_openai_wrapper.py
ollama_init.sh		ollama_init.sh
requirements.txt		requirements.txt
run_inference_all_defenses.py		run_inference_all_defenses.py
run_inference_no_defenses.py		run_inference_no_defenses.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Table of Contents

Abstract

Accessing the Data

Running evaluations with different LLMs

Export API keys for black box LLMs

Initialize ollama for open source LLMs

Running threat model for both experiment settings

Citation

Get Involved

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

salesforce/prompt-leakage

Folders and files

Latest commit

History

Repository files navigation

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Table of Contents

Abstract

Accessing the Data

Running evaluations with different LLMs

Export API keys for black box LLMs

Initialize ollama for open source LLMs

Running threat model for both experiment settings

Citation

Get Involved

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages