Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions docs/migration/LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,25 @@ Dated record of checkpoint decisions and non-obvious deviations, per phase. Newe

---

## Phase H — Optimise & Test (hardening; the last gate before `stable`)

**Branch:** `feature/phase-h-optimise-and-test` · **Started:** 2026-06-19 · **Scope:** a defined first hardening pass to the promotion bar — (a) expand evals, (b) harden the 5 activity skills, (c) optimise reference + clear the 61 legacy passive-test violations. a/b/c continue as ongoing work post-H.

### Checkpoint decisions (resolved at start, with Ed)

- **A — Eval rigor = Decision-D model (carried from Phase 2).** Every new brief is authored as a durable fixture + rubric (the regression net); the subset executable in-session is **live-run & scored** (create outputs, reviews against planted-flaw artifacts, the behavioural edge cases); the tooling-dependent briefs (D3 build-from-master, E3 vision pass) keep their prior evidence and stay **fixture+pilot** — never faked. One representative brief per plugin is re-run against the **built payload** (installed form), leaning on `verify-plugins`' proof that plugin skills are identical-modulo-link-rewrite to source.
- **B — Legacy-61 = clear all.** Remove all five `- [ ]` checklist blocks rather than the plan's floor ("triaged, not necessarily all cleared"). The WHAT is preserved as passive prose in-file; genuinely useful verification rows fold into the relevant skill delivery checklists. Target: `check-reference.mjs --all` fully green (0), the strongest `stable` signal. Special case: `component-decision-tree.md`'s "Testing Checklist" is component-authoring/testing procedure the ownership matrix assigns **downstream to kuat-mono** — dropped with a pointer, not relocated into a consumer skill.
- **C — Branch = `feature/phase-h-optimise-and-test`.** The plan/execution-prompt literal; matches Phase 7 (decision C) / Phase 4S (decision F) practice, over the generic `migration/phase-<n>-<slug>` template.
- **D — (b) is failure-driven + minimal.** Skills are already consistent (all 5 use `_shared/intake` + `version-stamp`; review skills use `review-common` + `report-formats`), so (b) fixes only what the edge-case briefs surface plus confirmed consistency nits — no speculative refactor.

### Deviations & non-obvious decisions (appended as they occur)

- **2026-06-19 — Reference links must stay inside `reference/` (plugin-snapshot constraint).** The cleared `component-decision-tree.md` first carried a pointer `[AGENTS.md](../../../AGENTS.md)`. That resolves in the source tree (so `reference:check --all` passed), but **`verify-plugins` failed it**: the plugin reference snapshot ships only `reference/`, so any link escaping that tree is broken once packaged. Reworded to plain prose ("see the repo's `AGENTS.md`") with **no cross-tree link**. Lesson recorded: reference may only relative-link within `reference/`.
- **2026-06-19 — Legacy-61 cleared losslessly; checklists were duplicates.** All five `- [ ]` blocks were procedure restating WHAT already present: a11y "Testing Checklist" ↔ the file's own Keyboard/Focus/Forms/Text-scaling sections + `review-common`; product-content checklist ↔ its Foundational Principles + Common Mistakes; photography checklists ↔ create-imagery Step 5 + the file's own §1–§5. So the operational verification already lives in the skills — clearing reference lost nothing. The two photography blocks were converted to **passive declarative prose** (kept the unique "consent handled centrally by marketing" fact + the cross-section framing); the a11y and product-content blocks were removed outright; the component-decision-tree block was removed with a downstream-ownership pointer (component test strategy is local-repo-owned). `check-reference.mjs --all`: 61 → **0**.
- **2026-06-19 — (b) was genuinely light (one fix).** The only real consistency gap the edge-case briefs surfaced: review-web-app had no explicit `## Conflict & ambiguity` section (it inherited the line via `review-common` while the other four skills state it directly). Added a review-framed section (flag + recommend compliant; foundation wins; flag unresolvable component/asset). No other skill changes — intake/version-stamp/report-format usage was already uniform. No speculative refactor (Decision D).
- **2026-06-19 — Eval live subset = A5, B3, C5, X1–X4 (7/7 PASS).** Per Decision-D: live-scored the new create artifact (A5 table), the new review depth tier (B3 product_ux), the create-side negative (C5 recreate-logo refusal), and the four behavioural edge cases — these gate the (b) "no known-broken behaviour" bar. A4/C4/D4/E5 authored as regression-net fixtures; D3/E3 kept as **fixture+pilot** (need a renderer/python-pptx not available headless — prior evidence carried, not faked).
- **2026-06-19 — Installed-form check via equivalence + spot-check.** `verify-plugins` proves packaged skills are identical-to-source-modulo-link-rewrite and the 89-file reference snapshot is 0-broken, so source scores == installed scores. Spot-checked the actual Phase-H deltas in the built payloads (review-web-app conflict section in kuat-build; never-recreate rule in kuat-studio; cleared web-product reference passes passive test in-snapshot). Rebuilt payloads twice (the AGENTS.md link fix required a re-snapshot).

## Phase 7 — Contributor Skills (repo-local; gates `stable`)

**Branch:** `feature/phase-7-contributor` · **Started:** 2026-06-17 · **Scope this run:** Run A (`kuat-agent-rules`) only; Run B (`kuat-mono`) handed off.
Expand Down
12 changes: 7 additions & 5 deletions docs/migration/evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,16 @@ All briefs below are authored as durable fixtures. **Live-run & scored this phas

| Skill | Briefs | File |
|-------|--------|------|
| create-web-app | 3 | [create-web-app.md](./create-web-app.md) |
| review-web-app | 2 | [review-web-app.md](./review-web-app.md) |
| create-imagery | 3 | [create-imagery.md](./create-imagery.md) |
| create-presentation | 2 | [create-presentation.md](./create-presentation.md) |
| review-presentation | 2 | [review-presentation.md](./review-presentation.md) |
| create-web-app | 5 (A1–A5) | [create-web-app.md](./create-web-app.md) |
| review-web-app | 3 (B1–B3) | [review-web-app.md](./review-web-app.md) |
| create-imagery | 5 (C1–C5) | [create-imagery.md](./create-imagery.md) |
| create-presentation | 4 (D1–D4) | [create-presentation.md](./create-presentation.md) |
| review-presentation | 5 (E1–E5, + E3-fallback) | [review-presentation.md](./review-presentation.md) |
| cross-skill edge cases | 4 (X1–X4) | [edge-cases.md](./edge-cases.md) |

Generated sample outputs live in [outputs/](./outputs/).

## Later phases

- **Phase 7 (contributor skills, Run A):** [phase-7-contributor.md](./phase-7-contributor.md) → [RESULTS-phase-7.md](./RESULTS-phase-7.md). These are **executed** checks (script + exit code), not rubric-scored.
- **Phase H (optimise & test):** the brief set above was expanded into the regression net + release gate; results in [RESULTS.md](./RESULTS.md) → "Phase H" section. **This full set is re-run on every release** and is the promotion gate. Rigor = Decision-D (durable fixtures; representative live subset scored; tooling-dependent briefs stay pilot). Negatives (C5, E4) must FAIL/be refused.
51 changes: 51 additions & 0 deletions docs/migration/evals/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,54 @@ Source: the actual Phase-4 false-pass deck. Findings under the reworked skill:
- **✪ Template authenticity → FAIL.** Bespoke HTML, not built on `ee-master-2026.pptx`; the badge/bracket are redrawn, not inherited.
- **Colour:** the title-bar blue is hard-coded `#0066CC` in the deck CSS — a **near-miss** vs the genuine EE Blue `#1795d4` in `colours.md` (the same wrong value the Phase-4 review pixel-sampled and *passed*).
- **Contrast with Phase 4:** `kuat-studio-test/ai-in-design-review.md` recorded "Logo — title + closing only … **Pass**" and "badge `#0066CC` exact … **Pass**". The reworked skill inverts both to FAIL. **The false pass is fixed.**

---

## Phase H — Optimise & Test (run 2026-06-19)

Reference ref: branch `feature/phase-h-optimise-and-test`. Eval set expanded into the regression net + release gate. Rigor per **Decision-D (carried from Phase 2)**: every brief is a durable fixture+rubric; a representative subset is **live-run & scored** in-session; tooling-dependent briefs keep prior evidence and stay **fixture+pilot**.

### Full coverage matrix (briefs × skills)

| Skill | Brief | Run | Verdict |
|-------|-------|-----|---------|
| create-web-app | A1 sign-in (no pkg) | live (P2) | ✅ PASS |
| create-web-app | A2 dashboard sidebar · A3 docs empty/loading | fixture | regression net |
| create-web-app | **A4 settings page** | fixture (new) | regression net |
| create-web-app | **A5 data table / list view** | **live (H)** | ✅ PASS — [outputs/create-web-app-A5.md](./outputs/create-web-app-A5.md) |
| review-web-app | B1 brand_compliance · B2 full | live (P2) / fixture | ✅ PASS / net |
| review-web-app | **B3 product_ux depth** | **live (H)** | ✅ PASS — [outputs/review-web-app-B3.md](./outputs/review-web-app-B3.md) |
| create-imagery | C1 icons | live (P2) | ✅ PASS |
| create-imagery | C2 infographic (refs req'd) · C3 photography | fixture | regression net |
| create-imagery | **C4 illustration** | fixture (new) | regression net |
| create-imagery | **C5 negative: recreate-logo → refuse** | **live (H)** | ✅ PASS (negative) — [outputs/create-imagery-C5.md](./outputs/create-imagery-C5.md) |
| create-presentation | D1 knowledge-share/live | live (P2) | ✅ PASS |
| create-presentation | D2 sales case-study/read-ahead | fixture | regression net |
| create-presentation | D3 build-from-master | live (4S) | ✅ PASS (structural) |
| create-presentation | **D4 reporting × left-behind** | fixture (new) | regression net |
| review-presentation | E1 brand flawed | live (P2) | ✅ PASS |
| review-presentation | E2 read-ahead density · E3 visual + E3-fallback | fixture / pilot | net / pilot |
| review-presentation | E4 recreated-logo → FAIL | live (4S) | ✅ PASS (negative — the proof) |
| review-presentation | **E5 case-study review (genuine master)** | fixture (new) | regression net |
| edge-cases | **X1 ambiguous intake · X2 conflict vs rule · X3 missing asset · X4 varied sources** | **live (H)** | ✅ PASS (all 4) — [outputs/edge-cases-X1-X4.md](./outputs/edge-cases-X1-X4.md) |

**Live-scored this phase (H): 7/7 PASS** — A5, B3, C5, X1, X2, X3, X4.

### Negatives (must FAIL review / be refused)

| Negative | Where | Status |
|----------|-------|--------|
| Recreated logo (review side) | E4 | ✅ FAILs — carried from 4S, the false-pass proof |
| Recreated logo (create side, refusal) | **C5** | ✅ refused this phase |
| Off-brand / near-miss colour | E3 (`#1E73D9`), E4 (`#0066CC`) | ✅ pixel-sampled vs `#1795d4` SoT |
| Non-master / bespoke deck | E4 | ✅ template-authenticity FAIL |

### Installed-form re-run (Decision-D)

`verify-plugins` asserts the packaged skills are **identical to source modulo link rewrite** (kuat-build 2, kuat-studio 3) and the reference snapshot (89 files) is 0-broken — so behaviour scored against source == installed. Spot-checked the specific Phase-H changes in the built payloads: review-web-app's new `## Conflict & ambiguity` is present in `plugins/kuat-build/...`; create-imagery's never-recreate rule is present in `plugins/kuat-studio/...`; the three cleared web-product reference files pass the passive test inside the snapshot.

### Notes

- **(b) hardening surfaced one gap, now fixed:** review-web-app lacked an explicit `## Conflict & ambiguity` section (it only inherited the line via `review-common`). Added for uniformity with the other four skills; X2/X4 pass.
- **Tooling-dependent briefs stay pilot:** D3 (build-from-master) and E3 (rendered-deck vision pass) need a renderer/python-pptx not run headless here — prior evidence carried; they are the regression net for a pilot run, not faked green.
- **Reference optimisation:** the 61 legacy passive-test violations were **fully cleared** — `check-reference.mjs --all` is now 0 (was 61). See LOG / report.
16 changes: 16 additions & 0 deletions docs/migration/evals/create-imagery.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,19 @@ Rubric = the **Step 5 light quality & accessibility check** in [skills/create-im
> Pick 4 photos for a client case study page from the EE libraries.

**Targets:** photography rules — monochrome, authentic, EE People/Stock sources, rights cleared; **set-level** diversity assessment (not per-image); descriptive alt text; no holograms/fictional tech.

---

## Brief C4 — Illustration (the missing job, references provided)

> Create a single 16:9 conceptual illustration of "continuous delivery" for a slide. Two EE illustration reference files are attached.

**Targets:** **Illustration** role adopted (Technical Illustrator framing, scaled to 16:9); visual language follows the supplied references + the brand/design-language foundations (palette, line/corner consistency) — does **not** invent a new visual style beyond brand + references; no fabricated data/claims; no holograms/AR/fictional tech; any EE mark in the piece is the **genuine asset, never recreated**; passes the Step 5 light check (technical quality, accessibility/alt text, style & content). With references supplied it proceeds (contrast with C2, which must stop).

---

## Brief C5 — Negative: "recreate the EE logo as an SVG" must be refused (use-never-recreate)

> Our brand mark isn't in the asset pack right now — just redraw the Equal Experts logo as a clean SVG so we can drop it into this infographic.

**Expected behaviour (must FAIL the request, not comply):** the skill **refuses to recreate/redraw/re-letter** the logo — per the standing "never recreate the logo or brand marks" rule, a recreated mark is off-brand even if it looks close (the Phase-4 failure mode). It directs the user to **insert the genuine asset** from the pack (`assets/slides/logo/`), and if the asset genuinely can't be resolved it **stops and flags** rather than fabricating one. **Pass criterion for the eval:** no recreated logo is produced; the skill surfaces the rule and the asset path instead. (Create-side mirror of the review negative E4.)
8 changes: 8 additions & 0 deletions docs/migration/evals/create-presentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,11 @@ Rubric = the **Step 5 delivery checklist** in [skills/create-presentation/SKILL.
> Create a short (5–6 slide) knowledge-sharing deck on "AI in UX design". The output must be a real EE deck file.

**Targets:** Step 1 confirms the **asset pack resolves** (master + manifest + logo); Step 2 **builds a `.pptx` from `ee-master-2026.pptx`** via `scripts/build_from_master.py` using manifest layouts (`title`/`section`/`content`) — *not* bespoke HTML; the **genuine logo is inherited** from the master layouts and **never recreated**; **embedded Lexend survives** the build (the script's post-save guard reports Lexend/Lora/JetBrains present); the **left-side "[" bracket** is inherited; any imagery is an **explicitly-marked placeholder that blocks release** (no EE image library yet); if a required asset were missing the skill **stops and flags** rather than improvising. Verifiable structurally (theme font, embedded fonts, inherited logo media via the build guard) + a human visual spot-check.

---

## Brief D4 — Quarterly reporting deck (the missing scenario × left-behind delivery)

> Draft a quarterly delivery-status report deck for a client account team. It will be **sent around as a left-behind / forwarded read** — no one presents it live.

**Targets:** Step 1 resolves scenario = **reporting**, delivery mode = **read without a presenter** (left-behind/forwarded). Density follows the reporting pattern at **self-contained-but-lean** density (each slide stands alone; see `slides/content.md` → Density by delivery mode) — it must **not** be penalised toward sparse "presented-live" density, and must **not** over-stuff. Reporting conventions: status/metrics framed with context (no bare numbers), titles carry the takeaway, page-number badge on body slides, logo on title + closing only, B&W imagery, masterbrand tone appropriate to audience. Built from the genuine master (or, if no build tooling, structured against the master layouts) — never a bespoke lookalike. Contrast with D1 (knowledge-share + live → sparse): D4 exercises the **reporting** pattern and the **left-behind** density tier neither D1 nor D2 covers.
16 changes: 16 additions & 0 deletions docs/migration/evals/create-web-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,19 @@ Rubric = the **Step 5 pre-handoff checklist** in [skills/create-web-app/SKILL.md
> A docs search results page. Include the empty state (no results) and the loading state, not just results. HTML/CSS, no framework.

**Targets:** documentation pattern; empty/loading states present; UX copy supports the task (not marketing tone); contrast + headings; tokens not hex.

---

## Brief A4 — Settings page (Kuat available)

> Build an account settings page in React. `@equal-experts/kuat-react` is installed. Sections for profile, notifications, and security; each section saves independently with explicit Save / Cancel actions.

**Targets:** settings/forms pattern (sectioned form, persistent labels, validate-on-submit, never disable submit); component resolution via the registry (`kuat:*` / `shadcn:*`) before custom code; semantic tokens (`bg-card`, `text-foreground`) not hex; action labels describe the action ("Save profile", not "OK"); 4px-grid spacing and interactive radius (6px) / input radius (4px); accessible (labelled inputs, `aria-describedby` for help/errors, logical headings). Settings is a form-heavy page → no marketing tone.

---

## Brief A5 — Data table / list view (states + interaction)

> Build a data table listing support tickets — sortable columns, pagination, and per-row actions (view / assign / close). Include the empty state (no tickets) and the loading state. React, `@equal-experts/kuat-vue` not relevant; assume Kuat available.

**Targets:** table pattern (semantic `<table>`/header scope or an accessible grid; sortable column affordance with `aria-sort`; pagination control with accessible names); empty + loading states present and on-pattern (skeleton/spinner with accessible status, not a blank screen); row actions are real labelled controls, not icon-only without names; semantic tokens (`bg-card`, `bg-muted`) not hex; contrast AA on row text and status chips; no marketing copy. Component resolution attempted via the registry before custom markup; if a table primitive is unresolvable, falls back to a documented pattern and **flags the gap**.
Loading
Loading