Skip to content

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33

Open
colombod wants to merge 4 commits into
mainfrom
feat/context-intelligence-mode-knowledge
Open

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33
colombod wants to merge 4 commits into
mainfrom
feat/context-intelligence-mode-knowledge

Conversation

@colombod
Copy link
Copy Markdown
Collaborator

@colombod colombod commented Jun 7, 2026

What this adds

Captures the institutional design knowledge for context-intelligence tooling into the mode itself, so it produces deeper, more complete designs and drives its design pipeline more robustly.

Design depth

  • Bounded-navigation discipline moved to a single authoritative home (context/navigation-budget-discipline.md) and referenced by session-navigator via @mention — one source of truth for keeping disk navigation within a context budget (no duplication).
  • Tool-design skill enriched (R1/R2/R3): module-vs-CLI selection by consumer, narrow-domain specialization, and progressive discovery — each pointing to its authoritative home.
  • New evaluation-methodology skill: metric design (quality/efficiency/efficacy), precursor-over-artifact metrics, and A/B + statistical-N discipline.
  • Thin strategy file + mode wiring that surface this knowledge only when the mode is active (lean always-on preserved).

Interactive & autonomous driving

  • The design facilitator opens with its actual Phase-0 question, re-anchors on off-script replies instead of breaking role, and owns its pipeline end-to-end.
  • A new autonomous/seeded entry path: when the activation message already carries the goal, the facilitator treats it as a pre-answered Phase 0 and proceeds — enabling non-interactive / recipe-driven use.

How the evaluation framework shaped this

The work was driven and validated by an outcome-eval harness with three scenarios (a pre-seeded design run, a multi-turn simulated user, and a one-shot). The evals did more than verify at the end — the baselines reshaped the scope, showing the mode's design depth was the highest-leverage gap. Every change maps to a scenario that measures it, so we built what we set out to build and can show it.

Measured evidence

Before (baseline) After
Seeded design run timed out before producing an evaluation plan converges (exit 0, ~4 min) — deeper Phase-2 design (per-consumer module-vs-CLI table citing the navigation-budget discipline) plus a complete Phase-3 evaluation plan (precision/recall, success criteria, DTU validation)
  • Single-source/no-duplication and mode-gated injection verified by inspection; lean always-on preserved.
  • Bundle test suite: 657 passed.

Notes

Markdown/YAML prompt-and-config only; validation is via the eval scenarios (prompt content is validated behaviorally, not by unit tests).

Colombo D added 4 commits June 7, 2026 09:28
Move the 6 defensive-navigation rules verbatim from session-navigator into
context/navigation-budget-discipline.md (authoritative source). session-navigator
now @mentions it (loading) and re-points its three in-document references; the
always-on awareness file gets a single non-loading pointer row. No rule content
changed; always-on behavior untouched (lean default preserved).
Enrich tool-design with R1 (module vs CLI by consumer, pointing to Standing Rule 3),
R2 (narrow-domain specialization), R3 (progressive discovery → navigation discipline),
plus an event-semantics guard. Add a new context-intelligence-evaluation-methodology
skill (metric design, precursor metrics, A/B + statistical-N; points to eval-design and
digital-twin-universe, never restating DTU-as-default or artifact-as-success). Add a thin
context-intelligence-strategy.md pointer table (non-loading references; names the
event-semantics principle once). Wire the strategy file via the mode's contributes.context
and the eval skill via contributes.skills. Extend the eval-design catalog with structural
scenarios 8-10 and behavioral Scenario C. Always-on behavior untouched.
6a: add PRE/POST-delegation constraints to the mode's file-not-found routing row
    (no preamble before delegate(); relay the facilitator's Part-A question verbatim).
6b: add a Phase-0 RE-ANCHOR rule so off-script user replies are treated as signal
    fragments and the opening question is re-asked, instead of breaking role.
6c: add a 'Pipeline ownership' standing rule to the facilitator countering the
    hooks-skills-visibility leak of brainstorming/using-superpowers mandates — no
    /brainstorm or /systems-design punt; the pipeline is self-contained from Phase 0.

No design-philosophy change; edge-case hardening only. Always-on behavior untouched.
7a: add a seeded-path routing row to the mode — when the activation message already
    contains a clear goal and domain-concepts.md is absent, delegate with
    seed_statement="<verbatim user goal>" (context_depth=none).
7b: add a facilitator 'Seeded entry' variant at the top of Phase 0 — treat the seed as
    the pre-answered Part A, skip the opening question, run the Part-B probe, then open
    with a data-grounded candidate framed on the seed.

Additive new path (does not change the interactive path). Always-on behavior untouched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant