Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

This repository contains the data and the code for the paper "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models" (http://arxiv.org/abs/2502.15851)

Dataset Creation

python code/synthesize_conflicting_data.py

python code/synthesize_conflicting_data_rich_context.py

Getting Responses

python code/get_responses.py

You will need to set up your favorite LLM clients in llm_api.py.

Processing Responses

python code/process_responses.py --target_dir {response_directory} --output_dir {processed_directory}

Evaluation and Analysis

On raw LLM responses:

python code/analyze.py --target_dir {response_directory} --response_type response

On processed LLM responses

python code/analyze.py --target_dir {processed_directory} --response_type processed_response

python code/make_tables.py

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
code		code
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Dataset Creation

Getting Responses

Processing Responses

Evaluation and Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

yilin-geng/llm_instruction_conflicts

Folders and files

Latest commit

History

Repository files navigation

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Dataset Creation

Getting Responses

Processing Responses

Evaluation and Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages