docs: link retrieval eval pipeline spec

sjarmak · sjarmak · commit dba9a393a301 · 2026-02-24T17:31:50.000Z
diff --git a/AGENTS.md b/AGENTS.md
@@ -19,6 +19,7 @@ per-task details. See `docs/MCP_UNIQUE_TASKS.md` for the MCP-unique extension.
 - `docs/TASK_CATALOG.md` - current task inventory
 - `docs/SCORING_SEMANTICS.md` - reward/pass interpretation (incl. oracle checks + hybrid scoring)
 - `docs/EVALUATION_PIPELINE.md` - unified eval: verifier → LLM judge → statistics → report
+- `docs/RETRIEVAL_EVAL_SPEC.md` - full retrieval/IR evaluation pipeline (normalized events → metrics/probes/taxonomy artifacts)
 - `docs/MCP_UNIQUE_TASKS.md` - MCP-unique task system (suites, authoring, oracle, DS tasks)
 - `docs/MCP_UNIQUE_CALIBRATION.md` - oracle coverage analysis and threshold calibration data
 - `docs/WORKFLOW_METRICS.md` - timing/cost metric definitions
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -20,6 +20,7 @@ per-task details. See `docs/MCP_UNIQUE_TASKS.md` for the MCP-unique extension.
 - `docs/TASK_CATALOG.md` - current task inventory
 - `docs/SCORING_SEMANTICS.md` - reward/pass interpretation (incl. oracle checks + hybrid scoring)
 - `docs/EVALUATION_PIPELINE.md` - unified eval: verifier → LLM judge → statistics → report
+- `docs/RETRIEVAL_EVAL_SPEC.md` - full retrieval/IR evaluation pipeline (normalized events → metrics/probes/taxonomy artifacts)
 - `docs/MCP_UNIQUE_TASKS.md` - MCP-unique task system (suites, authoring, oracle, DS tasks)
 - `docs/MCP_UNIQUE_CALIBRATION.md` - oracle coverage analysis and threshold calibration data
 - `docs/WORKFLOW_METRICS.md` - timing/cost metric definitions
diff --git a/docs/EVALUATION_PIPELINE.md b/docs/EVALUATION_PIPELINE.md
@@ -6,7 +6,10 @@ and statistical modules provide confidence intervals and correlation analysis.
 
 This document covers the pipeline architecture. For per-benchmark scoring
 details, see [SCORING_SEMANTICS.md](SCORING_SEMANTICS.md). For MCP-unique
-oracle checks, see [MCP_UNIQUE_TASKS.md](MCP_UNIQUE_TASKS.md).
+oracle checks, see [MCP_UNIQUE_TASKS.md](MCP_UNIQUE_TASKS.md). For the full
+retrieval/IR evaluation pipeline (normalized retrieval events, file/chunk IR
+metrics, utilization probes, taxonomy, and emitted artifacts), see
+[RETRIEVAL_EVAL_SPEC.md](RETRIEVAL_EVAL_SPEC.md).
 
 ---