|
5 | 5 | A reproducible benchmark and analysis toolkit for evaluating *epistemic boundary behavior* of LLMs when tool access is **absent but implied** (the *Simulation Fallacy* condition). |
6 | 6 |
|
7 | 7 | **Core findings (paper):** |
8 | | -- GPT-5: ~98% silent refusal |
9 | | -- Gemini 2.5 Pro: ~81% fabrication |
10 | | -- Claude Sonnet 4: admission/fabrication oscillation |
| 8 | +- **GPT-5**: ~98% silent refusal (epistemic boundary respected) |
| 9 | +- **Gemini 2.5 Pro**: ~81% fabrication (high confabulation rate) |
| 10 | +- **Claude Sonnet 4**: admission/fabrication oscillation (inconsistent boundary behavior) |
11 | 11 |
|
12 | | -Companion to *The Mirror Loop* (arXiv:2510.21861). Part of Course Correct Labs' epistemic reliability program. |
| 12 | +Companion to *The Mirror Loop* ([arXiv:2510.21861](https://arxiv.org/abs/2510.21861)). Part of Course Correct Labs' epistemic reliability program. |
13 | 13 |
|
14 | | -## Repo structure |
15 | | -- `results/final/` — final JSON and *_stats.json outputs |
16 | | -- `figures/` — generated figures |
17 | | -- `scripts/` — minimal analysis |
18 | | -- `notebooks/` — Colab notebook |
19 | | -- `prompts/` — prompt templates (add any missing ones you used) |
| 14 | +--- |
| 15 | + |
| 16 | +## Repository Structure |
| 17 | + |
| 18 | +``` |
| 19 | +simulation-fallacy/ |
| 20 | +│ |
| 21 | +├── results/final/ # Final JSON outputs and stats (8 files + 3 IRR artifacts) |
| 22 | +├── figures/ # Generated figures (Figure 1 & 2) |
| 23 | +├── scripts/ # Minimal analysis scripts |
| 24 | +│ ├── compute_metrics.py # Label counts and percentages |
| 25 | +│ ├── plot_figures.py # Cross-domain distribution (Figure 1) |
| 26 | +│ └── plot_transitions.py # Turn-by-turn dynamics (Figure 2) |
| 27 | +├── notebooks/ # Colab-ready reproduction notebook |
| 28 | +├── prompts/ # Exact prompt templates used in study (11 .txt files) |
| 29 | +├── DATA_DICTIONARY.md # Schema and field definitions |
| 30 | +├── CITATION.cff # Citation metadata |
| 31 | +└── README.md # This file |
| 32 | +``` |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Quickstart (Local) |
20 | 37 |
|
21 | | -## Quickstart (local) |
22 | 38 | ```bash |
23 | 39 | python -m venv .venv && source .venv/bin/activate |
24 | 40 | pip install -r requirements.txt |
25 | | -python scripts/compute_metrics.py --in_dir results/final --out_csv results/final/label_counts_with_pct.csv |
26 | | -python scripts/plot_figures.py --tables_csv results/final/label_counts_with_pct.csv --figdir figures |
| 41 | + |
| 42 | +# Compute label distributions |
| 43 | +python scripts/compute_metrics.py \ |
| 44 | + --in_dir results/final \ |
| 45 | + --out_csv results/final/label_counts_with_pct.csv |
| 46 | + |
| 47 | +# Regenerate Figure 1: Cross-domain response distribution |
| 48 | +python scripts/plot_figures.py \ |
| 49 | + --tables_csv results/final/label_counts_with_pct.csv \ |
| 50 | + --figdir figures |
| 51 | + |
| 52 | +# Regenerate Figure 2: Transition matrices |
| 53 | +python scripts/plot_transitions.py \ |
| 54 | + --in_dir results/final \ |
| 55 | + --figdir figures |
27 | 56 | ``` |
28 | 57 |
|
| 58 | +--- |
| 59 | + |
29 | 60 | ## Quickstart (Colab) |
30 | 61 |
|
31 | | -Open the badge above and Run all. |
| 62 | +Click the badge above and **Run all**. The notebook will: |
| 63 | +1. Clone this repository |
| 64 | +2. Install dependencies |
| 65 | +3. Compute metrics and regenerate both figures |
| 66 | +4. Display the results inline |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +## Figures |
| 71 | + |
| 72 | +### Figure 1: Cross-Domain Response Distribution |
| 73 | +**File:** `figures/figure1_cross_domain.png` |
| 74 | +**Description:** Model-level label distributions (FABRICATION, ADMISSION, SILENT_REFUSAL, NULL) across all tool-absence conditions (web search, image reference, database schema, file access). |
| 75 | +**Reproduces:** Run `scripts/plot_figures.py` |
| 76 | + |
| 77 | +### Figure 2: Turn-by-Turn Transition Dynamics |
| 78 | +**File:** `figures/figure2_transition_matrices.png` |
| 79 | +**Description:** Transition probability matrices showing how labels change across consecutive turns in the persistence study (3-turn sequences). |
| 80 | +**Reproduces:** Run `scripts/plot_transitions.py` |
| 81 | + |
| 82 | +--- |
32 | 83 |
|
33 | 84 | ## Data |
34 | 85 |
|
35 | | -We include the final canonical artifacts used in the paper under `results/final/`. Replace with your own runs to re-evaluate. |
| 86 | +We include the final canonical artifacts used in the paper under `results/final/`: |
| 87 | + |
| 88 | +- **Cross-domain study** (single-turn): |
| 89 | + - `cross_domain_v1_20251030_183025.json` + `_stats.json` |
| 90 | + - `cross_domain_v1_anthropic_catchup_20251030_233401.json` + `_stats.json` |
| 91 | + |
| 92 | +- **Persistence study** (3-turn sequences): |
| 93 | + - `persistence_v1_20251030_190503.json` + `_stats.json` |
| 94 | + - `persistence_v1_anthropic_catchup_20251030_234443.json` + `_stats.json` |
| 95 | + |
| 96 | +- **Inter-rater reliability**: |
| 97 | + - `irr_clean.csv`, `irr_confusion_matrix.csv`, `irr_report.md` |
| 98 | + |
| 99 | +**Schema documentation:** See [`DATA_DICTIONARY.md`](DATA_DICTIONARY.md) for field definitions and data structure. |
| 100 | + |
| 101 | +Replace these files with your own runs to re-evaluate the pipeline. |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Reproducibility Checklist |
| 106 | + |
| 107 | +✅ **Data availability**: All final results (JSON, IRR artifacts) are included in `results/final/` |
| 108 | +✅ **Deterministic scripts**: Analysis scripts produce identical output given the same input files |
| 109 | +✅ **Figures regenerate**: Both figures reproduce from the included data (minor matplotlib version differences possible) |
| 110 | +✅ **Prompts published**: Exact prompt templates are in `prompts/` (11 .txt files) |
| 111 | +✅ **IRR artifacts**: Human inter-rater reliability data and reports are provided |
| 112 | +✅ **No secrets**: No API keys, credentials, or proprietary data are included |
| 113 | +✅ **Version pinning**: `requirements.txt` specifies package versions (≥ constraints) |
| 114 | +✅ **Open license**: MIT license for code and artifacts |
| 115 | + |
| 116 | +**Note on LLM non-determinism**: Due to temperature/sampling and API-level variations, re-running the data collection pipeline will produce *similar* but not *identical* results. The published data represents the canonical run used in the paper. |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## Ethics & Risk Note |
| 121 | + |
| 122 | +- **No real user data**: All prompts are synthetic and designed to test epistemic boundaries, not to elicit harmful content. |
| 123 | +- **No secrets or credentials**: This repository contains no API keys, tokens, or proprietary information. |
| 124 | +- **Synthetic scenarios**: Prompt templates simulate tool-absence conditions (missing web search, database access, etc.) to measure model behavior under uncertainty. |
| 125 | +- **Research purpose**: This benchmark is intended for academic research and model safety evaluation. Findings should not be used to manipulate or mislead users. |
| 126 | + |
| 127 | +--- |
36 | 128 |
|
37 | 129 | ## Citation |
38 | 130 |
|
39 | | -See CITATION.cff. |
| 131 | +See [`CITATION.cff`](CITATION.cff) for machine-readable citation metadata. |
| 132 | + |
| 133 | +**BibTeX:** |
| 134 | +```bibtex |
| 135 | +@article{devilling2025simulation, |
| 136 | + title={Simulation Fallacy: How Models Behave When Tool Access Is Missing}, |
| 137 | + author={DeVilling, Bentley}, |
| 138 | + year={2025}, |
| 139 | + url={https://github.com/Course-Correct-Labs/simulation-fallacy} |
| 140 | +} |
| 141 | +``` |
| 142 | + |
| 143 | +--- |
| 144 | + |
| 145 | +## Related Work |
| 146 | + |
| 147 | +- [The Mirror Loop](https://arxiv.org/abs/2510.21861) — Semantic drift and novelty dynamics in recursive LLM self-interaction |
| 148 | +- [Recursive Confabulation](https://github.com/Course-Correct-Labs/recursive-confabulation) — Multi-turn hallucination persistence benchmark |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## Questions or Issues? |
| 153 | + |
| 154 | +Open an issue at [github.com/Course-Correct-Labs/simulation-fallacy/issues](https://github.com/Course-Correct-Labs/simulation-fallacy/issues). |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +**License:** MIT |
| 159 | +**Maintained by:** [Course Correct Labs](https://github.com/Course-Correct-Labs) |
0 commit comments