feat: capture context-intelligence design knowledge into the mode (eval-driven) by colombod · Pull Request #33 · microsoft/amplifier-bundle-context-intelligence

colombod · 2026-06-07T16:27:50Z

What this adds

Captures the institutional design knowledge for context-intelligence tooling into the mode itself, so it produces deeper, more complete designs and drives its design pipeline more robustly.

Design depth

Bounded-navigation discipline moved to a single authoritative home (context/navigation-budget-discipline.md) and referenced by session-navigator via @mention — one source of truth for keeping disk navigation within a context budget (no duplication).
Tool-design skill enriched (R1/R2/R3): module-vs-CLI selection by consumer, narrow-domain specialization, and progressive discovery — each pointing to its authoritative home.
New evaluation-methodology skill: metric design (quality/efficiency/efficacy), precursor-over-artifact metrics, and A/B + statistical-N discipline.
Thin strategy file + mode wiring that surface this knowledge only when the mode is active (lean always-on preserved).

Interactive & autonomous driving

The design facilitator opens with its actual Phase-0 question, re-anchors on off-script replies instead of breaking role, and owns its pipeline end-to-end.
A new autonomous/seeded entry path: when the activation message already carries the goal, the facilitator treats it as a pre-answered Phase 0 and proceeds — enabling non-interactive / recipe-driven use.

How the evaluation framework shaped this

The work was driven and validated by an outcome-eval harness with three scenarios (a pre-seeded design run, a multi-turn simulated user, and a one-shot). The evals did more than verify at the end — the baselines reshaped the scope, showing the mode's design depth was the highest-leverage gap. Every change maps to a scenario that measures it, so we built what we set out to build and can show it.

Measured evidence

	Before (baseline)	After
Seeded design run	timed out before producing an evaluation plan	converges (exit 0, ~4 min) — deeper Phase-2 design (per-consumer module-vs-CLI table citing the navigation-budget discipline) plus a complete Phase-3 evaluation plan (precision/recall, success criteria, DTU validation)

Single-source/no-duplication and mode-gated injection verified by inspection; lean always-on preserved.
Bundle test suite: 657 passed.

Notes

Markdown/YAML prompt-and-config only; validation is via the eval scenarios (prompt content is validated behaviorally, not by unit tests).

Move the 6 defensive-navigation rules verbatim from session-navigator into context/navigation-budget-discipline.md (authoritative source). session-navigator now @mentions it (loading) and re-points its three in-document references; the always-on awareness file gets a single non-loading pointer row. No rule content changed; always-on behavior untouched (lean default preserved).

Enrich tool-design with R1 (module vs CLI by consumer, pointing to Standing Rule 3), R2 (narrow-domain specialization), R3 (progressive discovery → navigation discipline), plus an event-semantics guard. Add a new context-intelligence-evaluation-methodology skill (metric design, precursor metrics, A/B + statistical-N; points to eval-design and digital-twin-universe, never restating DTU-as-default or artifact-as-success). Add a thin context-intelligence-strategy.md pointer table (non-loading references; names the event-semantics principle once). Wire the strategy file via the mode's contributes.context and the eval skill via contributes.skills. Extend the eval-design catalog with structural scenarios 8-10 and behavioral Scenario C. Always-on behavior untouched.

6a: add PRE/POST-delegation constraints to the mode's file-not-found routing row (no preamble before delegate(); relay the facilitator's Part-A question verbatim). 6b: add a Phase-0 RE-ANCHOR rule so off-script user replies are treated as signal fragments and the opening question is re-asked, instead of breaking role. 6c: add a 'Pipeline ownership' standing rule to the facilitator countering the hooks-skills-visibility leak of brainstorming/using-superpowers mandates — no /brainstorm or /systems-design punt; the pipeline is self-contained from Phase 0. No design-philosophy change; edge-case hardening only. Always-on behavior untouched.

7a: add a seeded-path routing row to the mode — when the activation message already contains a clear goal and domain-concepts.md is absent, delegate with seed_statement="<verbatim user goal>" (context_depth=none). 7b: add a facilitator 'Seeded entry' variant at the top of Phase 0 — treat the seed as the pre-answered Part A, skip the opening question, run the Part-B probe, then open with a data-grounded candidate framed on the seed. Additive new path (does not change the interactive path). Always-on behavior untouched.

Colombo D added 4 commits June 7, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33
colombod wants to merge 4 commits into
mainfrom
feat/context-intelligence-mode-knowledge

colombod commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

colombod commented Jun 7, 2026

What this adds

How the evaluation framework shaped this

Measured evidence

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant