generated from hack-ink/vibe-mono
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
area:opsLocal dev, scripts, and operational runbooks.Local dev, scripts, and operational runbooks.kind:choreMaintenance work that does not fit other kinds.Maintenance work that does not fit other kinds.theme:evaluationQuality measurement, gold sets, regressions, and metrics.Quality measurement, gold sets, regressions, and metrics.
Description
Summary
Add CI regression gates using elf-eval trace compare.
Problem
Without hard quality gates, retrieval regressions are discovered too late.
In Scope
- Add CI stage running trace-compare against fixed trace IDs.
- Define thresholds for positional churn, set churn, and top-k retention.
- Fail CI when thresholds are exceeded.
- Publish machine-readable diff artifacts.
Out of Scope
- Replacing existing integration tests.
- Full benchmark infrastructure expansion.
Deliverables
- CI workflow updates.
- Baseline trace set.
- Gate configuration docs.
Acceptance Criteria
- CI runs compare mode reproducibly.
- Regressions surface actionable metrics.
- Local reproduction instructions are documented.
Dependencies
- Search: graph relation context and explain integration #50
- Memory Policy v2: explicit remember/update/ignore/reject decisions #51
Implementation Checklist
- Baseline trace snapshot committed.
- CI job added.
- Threshold config documented.
- Failure artifacts exported.
Done When
- Retrieval quality regressions block merges by default.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:opsLocal dev, scripts, and operational runbooks.Local dev, scripts, and operational runbooks.kind:choreMaintenance work that does not fit other kinds.Maintenance work that does not fit other kinds.theme:evaluationQuality measurement, gold sets, regressions, and metrics.Quality measurement, gold sets, regressions, and metrics.