sidebar_position
2

🟢 Prompt Leaking

Prompt leaking is a form of prompt injection in which the model is asked to spit out its own prompt.

As shown in the example image(@ignore_previous_prompt) below, the attacker changes user_input to attempt to return the prompt. The intended goal is distinct from goal hijacking (normal prompt injection), where the attacker changes user_input to print malicious instructions(@ignore_previous_prompt).

import research from '@site/docs/assets/jailbreak/jailbreak_research.png';

The following image(@simon2022inject), again from the remoteli.io example, shows a Twitter user getting the model to leak its prompt.

import Image from '@site/docs/assets/jailbreak/injection_leak.png';

Well, so what? Why should anyone care about prompt leaking?

Sometimes people want to keep their prompts secret. For example an education company could be using the prompt explain this to me like I am 5 to explain complex topics. If the prompt is leaked, then anyone can use it without going through that company.

Microsoft Bing Chat

More notably, Microsoft released a ChatGPT powered search engine known as "the new Bing" on 2/7/23, which was demonstrated to be vulnerable to prompt leaking. The following example by @kliu128 demonstrates how given an earlier version of Bing Search, code-named "Sydney", was susceptible when giving a snippet of its prompt(@kevinbing). This would allow the user to retrieve the rest of the prompt without proper authentication to view it.

import bing from '@site/docs/assets/jailbreak/bing_chat.png';

With a recent surge in GPT-3 based startups, with much more complicated prompts that can take many hours to develop, this is a real concern.

Practice

Try to leak the following prompt(@chase2021adversarial) by appending text to it:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leaking.md

leaking.md

🟢 Prompt Leaking

Microsoft Bing Chat

Practice

Files

leaking.md

Latest commit

History

leaking.md

File metadata and controls

🟢 Prompt Leaking

Microsoft Bing Chat

Practice