Skip to content

This repository contains the data and the code for the paper "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models"

yilin-geng/llm_instruction_conflicts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

This repository contains the data and the code for the paper "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models" (http://arxiv.org/abs/2502.15851)

Dataset Creation

python code/synthesize_conflicting_data.py
python code/synthesize_conflicting_data_rich_context.py

Getting Responses

python code/get_responses.py

You will need to set up your favorite LLM clients in llm_api.py.

Processing Responses

python code/process_responses.py --target_dir {response_directory} --output_dir {processed_directory}

Evaluation and Analysis

On raw LLM responses:

python code/analyze.py --target_dir {response_directory} --response_type response

On processed LLM responses

python code/analyze.py --target_dir {processed_directory} --response_type processed_response
python code/make_tables.py

About

This repository contains the data and the code for the paper "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages