This repository contains the data and the code for the paper "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models" (http://arxiv.org/abs/2502.15851)
python code/synthesize_conflicting_data.pypython code/synthesize_conflicting_data_rich_context.pypython code/get_responses.pyYou will need to set up your favorite LLM clients in llm_api.py.
python code/process_responses.py --target_dir {response_directory} --output_dir {processed_directory}On raw LLM responses:
python code/analyze.py --target_dir {response_directory} --response_type responseOn processed LLM responses
python code/analyze.py --target_dir {processed_directory} --response_type processed_responsepython code/make_tables.py