EqualExperts · efordee24 · Jun 19, 2026 · Jun 19, 2026
diff --git a/docs/migration/LOG.md b/docs/migration/LOG.md
@@ -4,6 +4,25 @@ Dated record of checkpoint decisions and non-obvious deviations, per phase. Newe
 
 ---
 
+## Phase H — Optimise & Test (hardening; the last gate before `stable`)
+
+**Branch:** `feature/phase-h-optimise-and-test` · **Started:** 2026-06-19 · **Scope:** a defined first hardening pass to the promotion bar — (a) expand evals, (b) harden the 5 activity skills, (c) optimise reference + clear the 61 legacy passive-test violations. a/b/c continue as ongoing work post-H.
+
+### Checkpoint decisions (resolved at start, with Ed)
+
+- **A — Eval rigor = Decision-D model (carried from Phase 2).** Every new brief is authored as a durable fixture + rubric (the regression net); the subset executable in-session is **live-run & scored** (create outputs, reviews against planted-flaw artifacts, the behavioural edge cases); the tooling-dependent briefs (D3 build-from-master, E3 vision pass) keep their prior evidence and stay **fixture+pilot** — never faked. One representative brief per plugin is re-run against the **built payload** (installed form), leaning on `verify-plugins`' proof that plugin skills are identical-modulo-link-rewrite to source.
+- **B — Legacy-61 = clear all.** Remove all five `- [ ]` checklist blocks rather than the plan's floor ("triaged, not necessarily all cleared"). The WHAT is preserved as passive prose in-file; genuinely useful verification rows fold into the relevant skill delivery checklists. Target: `check-reference.mjs --all` fully green (0), the strongest `stable` signal. Special case: `component-decision-tree.md`'s "Testing Checklist" is component-authoring/testing procedure the ownership matrix assigns **downstream to kuat-mono** — dropped with a pointer, not relocated into a consumer skill.
+- **C — Branch = `feature/phase-h-optimise-and-test`.** The plan/execution-prompt literal; matches Phase 7 (decision C) / Phase 4S (decision F) practice, over the generic `migration/phase-<n>-<slug>` template.
+- **D — (b) is failure-driven + minimal.** Skills are already consistent (all 5 use `_shared/intake` + `version-stamp`; review skills use `review-common` + `report-formats`), so (b) fixes only what the edge-case briefs surface plus confirmed consistency nits — no speculative refactor.
+
+### Deviations & non-obvious decisions (appended as they occur)
+
+- **2026-06-19 — Reference links must stay inside `reference/` (plugin-snapshot constraint).** The cleared `component-decision-tree.md` first carried a pointer `[AGENTS.md](../../../AGENTS.md)`. That resolves in the source tree (so `reference:check --all` passed), but **`verify-plugins` failed it**: the plugin reference snapshot ships only `reference/`, so any link escaping that tree is broken once packaged. Reworded to plain prose ("see the repo's `AGENTS.md`") with **no cross-tree link**. Lesson recorded: reference may only relative-link within `reference/`.
+- **2026-06-19 — Legacy-61 cleared losslessly; checklists were duplicates.** All five `- [ ]` blocks were procedure restating WHAT already present: a11y "Testing Checklist" ↔ the file's own Keyboard/Focus/Forms/Text-scaling sections + `review-common`; product-content checklist ↔ its Foundational Principles + Common Mistakes; photography checklists ↔ create-imagery Step 5 + the file's own §1–§5. So the operational verification already lives in the skills — clearing reference lost nothing. The two photography blocks were converted to **passive declarative prose** (kept the unique "consent handled centrally by marketing" fact + the cross-section framing); the a11y and product-content blocks were removed outright; the component-decision-tree block was removed with a downstream-ownership pointer (component test strategy is local-repo-owned). `check-reference.mjs --all`: 61 → **0**.
+- **2026-06-19 — (b) was genuinely light (one fix).** The only real consistency gap the edge-case briefs surfaced: review-web-app had no explicit `## Conflict & ambiguity` section (it inherited the line via `review-common` while the other four skills state it directly). Added a review-framed section (flag + recommend compliant; foundation wins; flag unresolvable component/asset). No other skill changes — intake/version-stamp/report-format usage was already uniform. No speculative refactor (Decision D).
+- **2026-06-19 — Eval live subset = A5, B3, C5, X1–X4 (7/7 PASS).** Per Decision-D: live-scored the new create artifact (A5 table), the new review depth tier (B3 product_ux), the create-side negative (C5 recreate-logo refusal), and the four behavioural edge cases — these gate the (b) "no known-broken behaviour" bar. A4/C4/D4/E5 authored as regression-net fixtures; D3/E3 kept as **fixture+pilot** (need a renderer/python-pptx not available headless — prior evidence carried, not faked).
+- **2026-06-19 — Installed-form check via equivalence + spot-check.** `verify-plugins` proves packaged skills are identical-to-source-modulo-link-rewrite and the 89-file reference snapshot is 0-broken, so source scores == installed scores. Spot-checked the actual Phase-H deltas in the built payloads (review-web-app conflict section in kuat-build; never-recreate rule in kuat-studio; cleared web-product reference passes passive test in-snapshot). Rebuilt payloads twice (the AGENTS.md link fix required a re-snapshot).
+
 ## Phase 7 — Contributor Skills (repo-local; gates `stable`)
 
 **Branch:** `feature/phase-7-contributor` · **Started:** 2026-06-17 · **Scope this run:** Run A (`kuat-agent-rules`) only; Run B (`kuat-mono`) handed off.

diff --git a/docs/migration/evals/README.md b/docs/migration/evals/README.md
@@ -17,14 +17,16 @@ All briefs below are authored as durable fixtures. **Live-run & scored this phas
 
 | Skill | Briefs | File |
 |-------|--------|------|
-| create-web-app | 3 | [create-web-app.md](./create-web-app.md) |
-| review-web-app | 2 | [review-web-app.md](./review-web-app.md) |
-| create-imagery | 3 | [create-imagery.md](./create-imagery.md) |
-| create-presentation | 2 | [create-presentation.md](./create-presentation.md) |
-| review-presentation | 2 | [review-presentation.md](./review-presentation.md) |
+| create-web-app | 5 (A1–A5) | [create-web-app.md](./create-web-app.md) |
+| review-web-app | 3 (B1–B3) | [review-web-app.md](./review-web-app.md) |
+| create-imagery | 5 (C1–C5) | [create-imagery.md](./create-imagery.md) |
+| create-presentation | 4 (D1–D4) | [create-presentation.md](./create-presentation.md) |
+| review-presentation | 5 (E1–E5, + E3-fallback) | [review-presentation.md](./review-presentation.md) |
+| cross-skill edge cases | 4 (X1–X4) | [edge-cases.md](./edge-cases.md) |
 
 Generated sample outputs live in [outputs/](./outputs/).
 
 ## Later phases
 
 - **Phase 7 (contributor skills, Run A):** [phase-7-contributor.md](./phase-7-contributor.md) → [RESULTS-phase-7.md](./RESULTS-phase-7.md). These are **executed** checks (script + exit code), not rubric-scored.
+- **Phase H (optimise & test):** the brief set above was expanded into the regression net + release gate; results in [RESULTS.md](./RESULTS.md) → "Phase H" section. **This full set is re-run on every release** and is the promotion gate. Rigor = Decision-D (durable fixtures; representative live subset scored; tooling-dependent briefs stay pilot). Negatives (C5, E4) must FAIL/be refused.
diff --git a/docs/migration/evals/RESULTS.md b/docs/migration/evals/RESULTS.md
@@ -50,3 +50,54 @@ Source: the actual Phase-4 false-pass deck. Findings under the reworked skill:
 - **✪ Template authenticity → FAIL.** Bespoke HTML, not built on `ee-master-2026.pptx`; the badge/bracket are redrawn, not inherited.
 - **Colour:** the title-bar blue is hard-coded `#0066CC` in the deck CSS — a **near-miss** vs the genuine EE Blue `#1795d4` in `colours.md` (the same wrong value the Phase-4 review pixel-sampled and *passed*).
 - **Contrast with Phase 4:** `kuat-studio-test/ai-in-design-review.md` recorded "Logo — title + closing only … **Pass**" and "badge `#0066CC` exact … **Pass**". The reworked skill inverts both to FAIL. **The false pass is fixed.**
+
+---
+
+## Phase H — Optimise & Test (run 2026-06-19)
+
+Reference ref: branch `feature/phase-h-optimise-and-test`. Eval set expanded into the regression net + release gate. Rigor per **Decision-D (carried from Phase 2)**: every brief is a durable fixture+rubric; a representative subset is **live-run & scored** in-session; tooling-dependent briefs keep prior evidence and stay **fixture+pilot**.
+
+### Full coverage matrix (briefs × skills)
+
+| Skill | Brief | Run | Verdict |
+|-------|-------|-----|---------|
+| create-web-app | A1 sign-in (no pkg) | live (P2) | ✅ PASS |
+| create-web-app | A2 dashboard sidebar · A3 docs empty/loading | fixture | regression net |
+| create-web-app | **A4 settings page** | fixture (new) | regression net |
+| create-web-app | **A5 data table / list view** | **live (H)** | ✅ PASS — [outputs/create-web-app-A5.md](./outputs/create-web-app-A5.md) |
+| review-web-app | B1 brand_compliance · B2 full | live (P2) / fixture | ✅ PASS / net |
+| review-web-app | **B3 product_ux depth** | **live (H)** | ✅ PASS — [outputs/review-web-app-B3.md](./outputs/review-web-app-B3.md) |
+| create-imagery | C1 icons | live (P2) | ✅ PASS |
+| create-imagery | C2 infographic (refs req'd) · C3 photography | fixture | regression net |
+| create-imagery | **C4 illustration** | fixture (new) | regression net |
+| create-imagery | **C5 negative: recreate-logo → refuse** | **live (H)** | ✅ PASS (negative) — [outputs/create-imagery-C5.md](./outputs/create-imagery-C5.md) |
+| create-presentation | D1 knowledge-share/live | live (P2) | ✅ PASS |
+| create-presentation | D2 sales case-study/read-ahead | fixture | regression net |
+| create-presentation | D3 build-from-master | live (4S) | ✅ PASS (structural) |
+| create-presentation | **D4 reporting × left-behind** | fixture (new) | regression net |
+| review-presentation | E1 brand flawed | live (P2) | ✅ PASS |
+| review-presentation | E2 read-ahead density · E3 visual + E3-fallback | fixture / pilot | net / pilot |
+| review-presentation | E4 recreated-logo → FAIL | live (4S) | ✅ PASS (negative — the proof) |
+| review-presentation | **E5 case-study review (genuine master)** | fixture (new) | regression net |
+| edge-cases | **X1 ambiguous intake · X2 conflict vs rule · X3 missing asset · X4 varied sources** | **live (H)** | ✅ PASS (all 4) — [outputs/edge-cases-X1-X4.md](./outputs/edge-cases-X1-X4.md) |
+
+**Live-scored this phase (H): 7/7 PASS** — A5, B3, C5, X1, X2, X3, X4.
+
+### Negatives (must FAIL review / be refused)
+
+| Negative | Where | Status |
+|----------|-------|--------|
+| Recreated logo (review side) | E4 | ✅ FAILs — carried from 4S, the false-pass proof |
+| Recreated logo (create side, refusal) | **C5** | ✅ refused this phase |
+| Off-brand / near-miss colour | E3 (`#1E73D9`), E4 (`#0066CC`) | ✅ pixel-sampled vs `#1795d4` SoT |
+| Non-master / bespoke deck | E4 | ✅ template-authenticity FAIL |
+
+### Installed-form re-run (Decision-D)
+
+`verify-plugins` asserts the packaged skills are **identical to source modulo link rewrite** (kuat-build 2, kuat-studio 3) and the reference snapshot (89 files) is 0-broken — so behaviour scored against source == installed. Spot-checked the specific Phase-H changes in the built payloads: review-web-app's new `## Conflict & ambiguity` is present in `plugins/kuat-build/...`; create-imagery's never-recreate rule is present in `plugins/kuat-studio/...`; the three cleared web-product reference files pass the passive test inside the snapshot.
+
+### Notes
+
+- **(b) hardening surfaced one gap, now fixed:** review-web-app lacked an explicit `## Conflict & ambiguity` section (it only inherited the line via `review-common`). Added for uniformity with the other four skills; X2/X4 pass.
+- **Tooling-dependent briefs stay pilot:** D3 (build-from-master) and E3 (rendered-deck vision pass) need a renderer/python-pptx not run headless here — prior evidence carried; they are the regression net for a pilot run, not faked green.
+- **Reference optimisation:** the 61 legacy passive-test violations were **fully cleared** — `check-reference.mjs --all` is now 0 (was 61). See LOG / report.
diff --git a/docs/migration/evals/create-imagery.md b/docs/migration/evals/create-imagery.md
@@ -25,3 +25,19 @@ Rubric = the **Step 5 light quality & accessibility check** in [skills/create-im
 > Pick 4 photos for a client case study page from the EE libraries.
 
 **Targets:** photography rules — monochrome, authentic, EE People/Stock sources, rights cleared; **set-level** diversity assessment (not per-image); descriptive alt text; no holograms/fictional tech.
+
+---
+
+## Brief C4 — Illustration (the missing job, references provided)
+
+> Create a single 16:9 conceptual illustration of "continuous delivery" for a slide. Two EE illustration reference files are attached.
+
+**Targets:** **Illustration** role adopted (Technical Illustrator framing, scaled to 16:9); visual language follows the supplied references + the brand/design-language foundations (palette, line/corner consistency) — does **not** invent a new visual style beyond brand + references; no fabricated data/claims; no holograms/AR/fictional tech; any EE mark in the piece is the **genuine asset, never recreated**; passes the Step 5 light check (technical quality, accessibility/alt text, style & content). With references supplied it proceeds (contrast with C2, which must stop).
+
+---
+
+## Brief C5 — Negative: "recreate the EE logo as an SVG" must be refused (use-never-recreate)
+
+> Our brand mark isn't in the asset pack right now — just redraw the Equal Experts logo as a clean SVG so we can drop it into this infographic.
+
+**Expected behaviour (must FAIL the request, not comply):** the skill **refuses to recreate/redraw/re-letter** the logo — per the standing "never recreate the logo or brand marks" rule, a recreated mark is off-brand even if it looks close (the Phase-4 failure mode). It directs the user to **insert the genuine asset** from the pack (`assets/slides/logo/`), and if the asset genuinely can't be resolved it **stops and flags** rather than fabricating one. **Pass criterion for the eval:** no recreated logo is produced; the skill surfaces the rule and the asset path instead. (Create-side mirror of the review negative E4.)
diff --git a/docs/migration/evals/create-presentation.md b/docs/migration/evals/create-presentation.md
@@ -25,3 +25,11 @@ Rubric = the **Step 5 delivery checklist** in [skills/create-presentation/SKILL.
 > Create a short (5–6 slide) knowledge-sharing deck on "AI in UX design". The output must be a real EE deck file.
 
 **Targets:** Step 1 confirms the **asset pack resolves** (master + manifest + logo); Step 2 **builds a `.pptx` from `ee-master-2026.pptx`** via `scripts/build_from_master.py` using manifest layouts (`title`/`section`/`content`) — *not* bespoke HTML; the **genuine logo is inherited** from the master layouts and **never recreated**; **embedded Lexend survives** the build (the script's post-save guard reports Lexend/Lora/JetBrains present); the **left-side "[" bracket** is inherited; any imagery is an **explicitly-marked placeholder that blocks release** (no EE image library yet); if a required asset were missing the skill **stops and flags** rather than improvising. Verifiable structurally (theme font, embedded fonts, inherited logo media via the build guard) + a human visual spot-check.
+
+---
+
+## Brief D4 — Quarterly reporting deck (the missing scenario × left-behind delivery)
+
+> Draft a quarterly delivery-status report deck for a client account team. It will be **sent around as a left-behind / forwarded read** — no one presents it live.
+
+**Targets:** Step 1 resolves scenario = **reporting**, delivery mode = **read without a presenter** (left-behind/forwarded). Density follows the reporting pattern at **self-contained-but-lean** density (each slide stands alone; see `slides/content.md` → Density by delivery mode) — it must **not** be penalised toward sparse "presented-live" density, and must **not** over-stuff. Reporting conventions: status/metrics framed with context (no bare numbers), titles carry the takeaway, page-number badge on body slides, logo on title + closing only, B&W imagery, masterbrand tone appropriate to audience. Built from the genuine master (or, if no build tooling, structured against the master layouts) — never a bespoke lookalike. Contrast with D1 (knowledge-share + live → sparse): D4 exercises the **reporting** pattern and the **left-behind** density tier neither D1 nor D2 covers.
diff --git a/docs/migration/evals/create-web-app.md b/docs/migration/evals/create-web-app.md
@@ -25,3 +25,19 @@ Rubric = the **Step 5 pre-handoff checklist** in [skills/create-web-app/SKILL.md
 > A docs search results page. Include the empty state (no results) and the loading state, not just results. HTML/CSS, no framework.
 
 **Targets:** documentation pattern; empty/loading states present; UX copy supports the task (not marketing tone); contrast + headings; tokens not hex.
+
+---
+
+## Brief A4 — Settings page (Kuat available)
+
+> Build an account settings page in React. `@equal-experts/kuat-react` is installed. Sections for profile, notifications, and security; each section saves independently with explicit Save / Cancel actions.
+
+**Targets:** settings/forms pattern (sectioned form, persistent labels, validate-on-submit, never disable submit); component resolution via the registry (`kuat:*` / `shadcn:*`) before custom code; semantic tokens (`bg-card`, `text-foreground`) not hex; action labels describe the action ("Save profile", not "OK"); 4px-grid spacing and interactive radius (6px) / input radius (4px); accessible (labelled inputs, `aria-describedby` for help/errors, logical headings). Settings is a form-heavy page → no marketing tone.
+
+---
+
+## Brief A5 — Data table / list view (states + interaction)
+
+> Build a data table listing support tickets — sortable columns, pagination, and per-row actions (view / assign / close). Include the empty state (no tickets) and the loading state. React, `@equal-experts/kuat-vue` not relevant; assume Kuat available.
+
+**Targets:** table pattern (semantic `<table>`/header scope or an accessible grid; sortable column affordance with `aria-sort`; pagination control with accessible names); empty + loading states present and on-pattern (skeleton/spinner with accessible status, not a blank screen); row actions are real labelled controls, not icon-only without names; semantic tokens (`bg-card`, `bg-muted`) not hex; contrast AA on row text and status chips; no marketing copy. Component resolution attempted via the registry before custom markup; if a table primitive is unresolvable, falls back to a documented pattern and **flags the gap**.