Course-Correct-Labs
diff --git a/‎DATA_DICTIONARY.md‎
Lines changed: 128 additions & 0 deletions b/‎DATA_DICTIONARY.md‎
Lines changed: 128 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 136 additions & 16 deletions b/‎README.md‎
Lines changed: 136 additions & 16 deletions
diff --git a/‎figures/figure2_transition_matrices.png‎
-65.1 KB b/‎figures/figure2_transition_matrices.png‎
-65.1 KB
@@ -0,0 +1,128 @@
+# Data Dictionary
+
+This document describes the structure and fields of all JSON files in `results/final/`.
+
+## File Naming Convention
+
+- `{study_type}_v{version}_{timestamp}.json` — Full per-sample results
+- `{study_type}_v{version}_{timestamp}_stats.json` — Aggregated statistics
+
+Study types:
+- `cross_domain` — Single-turn responses across different tool-absence conditions (web, image, database, file)
+- `persistence` — Multi-turn (3 turns) to measure label stability over repeated queries
+
+## Stats Files (`*_stats.json`)
+
+Aggregated model-level statistics.
+
+### Top-level Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `total_responses` | int | Total number of responses across all models |
+| `total_calls` | int | Total API calls made (may include retries) |
+| `by_model` | object | Statistics grouped by model identifier |
+
+### `by_model[model_name]` Object
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `total` | int | Total responses for this model |
+| `labels` | object | Raw counts per label (FABRICATION, ADMISSION, SILENT_REFUSAL, NULL) |
+| `rates` | object | Proportions (0–1) for each label |
+| `cis_wilson_95` | object | 95% confidence intervals (Wilson score) for each label |
+| `cis_wilson_95[label].lo` | float | Lower bound of 95% CI |
+| `cis_wilson_95[label].hi` | float | Upper bound of 95% CI |
+| `blame_rate` | float | Proportion of responses that contain blame language (deprecated/optional) |
+| `cost_usd` | float | Total cost in USD for this model's API calls |
+
+### Label Taxonomy
+
+| Label | Description |
+|-------|-------------|
+| `FABRICATION` | Model generates plausible but false output (hallucination under tool absence) |
+| `ADMISSION` | Model explicitly states it cannot perform the task |
+| `SILENT_REFUSAL` | Model returns structured refusal (e.g., `null` values, empty fields) without explanation |
+| `NULL` | Ambiguous or unclassifiable response |
+
+## Full Result Files (`*.json` without `_stats`)
+
+Per-sample results with full response data.
+
+### Top-level Structure
+
+```json
+{
+  "config": { ... },
+  "results": { "model_name": [ ... ] },
+  "total_spend": float,
+  "elapsed": float,
+  "completed": timestamp
+}
+```
+
+### `config` Object
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `budget_usd_cap` | float | Maximum budget allowed for the run |
+| `conditions` | array | List of experimental conditions (tool-absence scenarios) |
+| `conditions[i].id` | string | Condition identifier (e.g., `no_web_search`) |
+| `conditions[i].template` | string | Prompt template filename used |
+| `models` | array | List of models tested |
+| `models[i].model` | string | Model identifier (e.g., `gpt-5`) |
+| `models[i].provider` | string | Provider name (`openai`, `anthropic`, `google`) |
+| `max_completion_tokens_*` | int | Max tokens per completion (provider-specific) |
+
+### `results[model_name]` Array
+
+Each element is a single API call result:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `dedupe_key` | string | SHA256 hash identifying unique prompt+condition+seed combinations |
+| `provider` | string | API provider (`openai`, `anthropic`, `google`) |
+| `model` | string | Full model identifier |
+| `condition_id` | string | Experimental condition ID (links to `config.conditions`) |
+| `seed` | int | Random seed for this sample (for reproducibility) |
+| `turn_index` | int | Turn number (0-indexed; only multi-turn in `persistence` study) |
+| `success` | bool | Whether API call succeeded |
+| `classification` | string | Human/automated label (FABRICATION, ADMISSION, SILENT_REFUSAL, NULL) |
+| `response_content` | string | Raw model response (may be JSON, text, or structured output) |
+| `tokens_prompt` | int | Input tokens used |
+| `tokens_completion` | int | Output tokens generated |
+| `cost_usd` | float | Cost of this individual call |
+| `timestamp` | string | ISO 8601 timestamp of API call |
+
+### Multi-turn Sequences (Persistence Study Only)
+
+Responses with the same `dedupe_key` form a sequence. Use `turn_index` to order them chronologically. The `persistence` study has 3 turns per sequence (turn 0, 1, 2).
+
+**Transition matrices** are computed from pairs `(classification[turn_N], classification[turn_N+1])` within each sequence.
+
+## Inter-Rater Reliability Files
+
+| File | Description |
+|------|-------------|
+| `irr_clean.csv` | Human-labeled subset for IRR validation |
+| `irr_confusion_matrix.csv` | Agreement matrix between two raters |
+| `irr_report.md` | Cohen's κ and agreement statistics |
+
+Columns in `irr_clean.csv`:
+- `sample_id` — Unique identifier
+- `model` — Model tested
+- `condition_id` — Experimental condition
+- `response_content` — Model output
+- `rater_1` — Label assigned by first rater
+- `rater_2` — Label assigned by second rater
+- `consensus` — Final agreed label (used in main analysis)
+
+## Reproducibility Notes
+
+- All `dedupe_key` values are deterministic: changing the prompt, condition, or seed will produce a different hash.
+- `turn_index` is always `0` for single-turn studies (`cross_domain`).
+- Cost estimates are based on provider-reported token counts at time of execution (rates may change).
+
+## Questions?
+
+Open an issue at [github.com/Course-Correct-Labs/simulation-fallacy](https://github.com/Course-Correct-Labs/simulation-fallacy).
@@ -5,35 +5,155 @@
 A reproducible benchmark and analysis toolkit for evaluating *epistemic boundary behavior* of LLMs when tool access is **absent but implied** (the *Simulation Fallacy* condition).
 
 **Core findings (paper):**
-- GPT-5: ~98% silent refusal
-- Gemini 2.5 Pro: ~81% fabrication
-- Claude Sonnet 4: admission/fabrication oscillation
+- **GPT-5**: ~98% silent refusal (epistemic boundary respected)
+- **Gemini 2.5 Pro**: ~81% fabrication (high confabulation rate)
+- **Claude Sonnet 4**: admission/fabrication oscillation (inconsistent boundary behavior)
 
-Companion to *The Mirror Loop* (arXiv:2510.21861). Part of Course Correct Labs' epistemic reliability program.
+Companion to *The Mirror Loop* ([arXiv:2510.21861](https://arxiv.org/abs/2510.21861)). Part of Course Correct Labs' epistemic reliability program.
 
-## Repo structure
-- `results/final/` — final JSON and *_stats.json outputs
-- `figures/` — generated figures
-- `scripts/` — minimal analysis
-- `notebooks/` — Colab notebook
-- `prompts/` — prompt templates (add any missing ones you used)
+---
+
+## Repository Structure
+
+```
+simulation-fallacy/
+│
+├── results/final/          # Final JSON outputs and stats (8 files + 3 IRR artifacts)
+├── figures/                # Generated figures (Figure 1 & 2)
+├── scripts/                # Minimal analysis scripts
+│   ├── compute_metrics.py  # Label counts and percentages
+│   ├── plot_figures.py     # Cross-domain distribution (Figure 1)
+│   └── plot_transitions.py # Turn-by-turn dynamics (Figure 2)
+├── notebooks/              # Colab-ready reproduction notebook
+├── prompts/                # Exact prompt templates used in study (11 .txt files)
+├── DATA_DICTIONARY.md      # Schema and field definitions
+├── CITATION.cff            # Citation metadata
+└── README.md               # This file
+```
+
+---
+
+## Quickstart (Local)
 
-## Quickstart (local)
 ```bash
 python -m venv .venv && source .venv/bin/activate
 pip install -r requirements.txt
-python scripts/compute_metrics.py --in_dir results/final --out_csv results/final/label_counts_with_pct.csv
-python scripts/plot_figures.py --tables_csv results/final/label_counts_with_pct.csv --figdir figures
+
+# Compute label distributions
+python scripts/compute_metrics.py \
+  --in_dir results/final \
+  --out_csv results/final/label_counts_with_pct.csv
+
+# Regenerate Figure 1: Cross-domain response distribution
+python scripts/plot_figures.py \
+  --tables_csv results/final/label_counts_with_pct.csv \
+  --figdir figures
+
+# Regenerate Figure 2: Transition matrices
+python scripts/plot_transitions.py \
+  --in_dir results/final \
+  --figdir figures
 ```
 
+---
+
 ## Quickstart (Colab)
 
-Open the badge above and Run all.
+Click the badge above and **Run all**. The notebook will:
+1. Clone this repository
+2. Install dependencies
+3. Compute metrics and regenerate both figures
+4. Display the results inline
+
+---
+
+## Figures
+
+### Figure 1: Cross-Domain Response Distribution
+**File:** `figures/figure1_cross_domain.png`  
+**Description:** Model-level label distributions (FABRICATION, ADMISSION, SILENT_REFUSAL, NULL) across all tool-absence conditions (web search, image reference, database schema, file access).  
+**Reproduces:** Run `scripts/plot_figures.py`
+
+### Figure 2: Turn-by-Turn Transition Dynamics
+**File:** `figures/figure2_transition_matrices.png`  
+**Description:** Transition probability matrices showing how labels change across consecutive turns in the persistence study (3-turn sequences).  
+**Reproduces:** Run `scripts/plot_transitions.py`
+
+---
 
 ## Data
 
-We include the final canonical artifacts used in the paper under `results/final/`. Replace with your own runs to re-evaluate.
+We include the final canonical artifacts used in the paper under `results/final/`:
+
+- **Cross-domain study** (single-turn):
+  - `cross_domain_v1_20251030_183025.json` + `_stats.json`
+  - `cross_domain_v1_anthropic_catchup_20251030_233401.json` + `_stats.json`
+
+- **Persistence study** (3-turn sequences):
+  - `persistence_v1_20251030_190503.json` + `_stats.json`
+  - `persistence_v1_anthropic_catchup_20251030_234443.json` + `_stats.json`
+
+- **Inter-rater reliability**:
+  - `irr_clean.csv`, `irr_confusion_matrix.csv`, `irr_report.md`
+
+**Schema documentation:** See [`DATA_DICTIONARY.md`](DATA_DICTIONARY.md) for field definitions and data structure.
+
+Replace these files with your own runs to re-evaluate the pipeline.
+
+---
+
+## Reproducibility Checklist
+
+✅ **Data availability**: All final results (JSON, IRR artifacts) are included in `results/final/`  
+✅ **Deterministic scripts**: Analysis scripts produce identical output given the same input files  
+✅ **Figures regenerate**: Both figures reproduce from the included data (minor matplotlib version differences possible)  
+✅ **Prompts published**: Exact prompt templates are in `prompts/` (11 .txt files)  
+✅ **IRR artifacts**: Human inter-rater reliability data and reports are provided  
+✅ **No secrets**: No API keys, credentials, or proprietary data are included  
+✅ **Version pinning**: `requirements.txt` specifies package versions (≥ constraints)  
+✅ **Open license**: MIT license for code and artifacts
+
+**Note on LLM non-determinism**: Due to temperature/sampling and API-level variations, re-running the data collection pipeline will produce *similar* but not *identical* results. The published data represents the canonical run used in the paper.
+
+---
+
+## Ethics & Risk Note
+
+- **No real user data**: All prompts are synthetic and designed to test epistemic boundaries, not to elicit harmful content.
+- **No secrets or credentials**: This repository contains no API keys, tokens, or proprietary information.
+- **Synthetic scenarios**: Prompt templates simulate tool-absence conditions (missing web search, database access, etc.) to measure model behavior under uncertainty.
+- **Research purpose**: This benchmark is intended for academic research and model safety evaluation. Findings should not be used to manipulate or mislead users.
+
+---
 
 ## Citation
 
-See CITATION.cff.
+See [`CITATION.cff`](CITATION.cff) for machine-readable citation metadata.
+
+**BibTeX:**
+```bibtex
+@article{devilling2025simulation,
+  title={Simulation Fallacy: How Models Behave When Tool Access Is Missing},
+  author={DeVilling, Bentley},
+  year={2025},
+  url={https://github.com/Course-Correct-Labs/simulation-fallacy}
+}
+```
+
+---
+
+## Related Work
+
+- [The Mirror Loop](https://arxiv.org/abs/2510.21861) — Semantic drift and novelty dynamics in recursive LLM self-interaction
+- [Recursive Confabulation](https://github.com/Course-Correct-Labs/recursive-confabulation) — Multi-turn hallucination persistence benchmark
+
+---
+
+## Questions or Issues?
+
+Open an issue at [github.com/Course-Correct-Labs/simulation-fallacy/issues](https://github.com/Course-Correct-Labs/simulation-fallacy/issues).
+
+---
+
+**License:** MIT  
+**Maintained by:** [Course Correct Labs](https://github.com/Course-Correct-Labs)