Skip to content

Quality gate: CI regression checks with elf-eval trace compare #54

@aurexav

Description

@aurexav

Summary

Add CI regression gates using elf-eval trace compare.

Problem

Without hard quality gates, retrieval regressions are discovered too late.

In Scope

  • Add CI stage running trace-compare against fixed trace IDs.
  • Define thresholds for positional churn, set churn, and top-k retention.
  • Fail CI when thresholds are exceeded.
  • Publish machine-readable diff artifacts.

Out of Scope

  • Replacing existing integration tests.
  • Full benchmark infrastructure expansion.

Deliverables

  • CI workflow updates.
  • Baseline trace set.
  • Gate configuration docs.

Acceptance Criteria

  • CI runs compare mode reproducibly.
  • Regressions surface actionable metrics.
  • Local reproduction instructions are documented.

Dependencies

Implementation Checklist

  • Baseline trace snapshot committed.
  • CI job added.
  • Threshold config documented.
  • Failure artifacts exported.

Done When

  • Retrieval quality regressions block merges by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:opsLocal dev, scripts, and operational runbooks.kind:choreMaintenance work that does not fit other kinds.theme:evaluationQuality measurement, gold sets, regressions, and metrics.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions