Three sister Claude Code plugins for running autonomous overnight work sessions that land a polished deliverable, an insight brief, or a stack of reviewed PRs on your desk by morning, with multi-agent review panels baked in to catch factual errors before they reach the client.
| Plugin | When to use |
|---|---|
| overnight-review-client-delivery | You already have a client deliverable (slide deck, report, HTML, memo) that needs polishing + quality-gating before a morning hand-off. Runs Phase A (content work) + Phase B (8-agent review panel in parallel) + Phase C (morning synthesis). |
| overnight-insight-discovery | You want to generate a client-facing insight brief from scratch — surfacing funnel leaks and surprise patterns from data. Runs two parallel tracks (B = LLM-autonomous creative exploration + C = hybrid deterministic-with-narration), consolidates, and reviews. |
| overnight-multi-issue-implementation | You have a cluster of 6–15 related GitHub issues (typically a P1 review-panel finding set) and want them implemented + reviewed + opened as stacked PRs by morning. Runs Phase A (PR1 tasks via subagent-driven-development) + Phase B (PR2 tasks stacked on PR1) + Phase C (PR-level code review + morning hand-off). |
They share the same phase structure, locked-file escape hatch, branch hygiene, and file-first discipline — use them as a set, in pairs, or individually.
Overnight autonomous runs are seductive but brittle. The typical failure modes:
- Hallucinated conclusions. The model "finds" patterns that are restatements of known features, or narrates trivial tautologies as surprising.
- Factual errors ship to the client. A single reviewer (you, sleep-deprived in the morning) misses a wrong BSTS CI, a decomposition table with inverted signs, a mislabelled cohort.
- Stale content dressed as fresh. Author adds an "archive banner" at the top + updates the headline, leaves the body with old numbers — readers can't tell which parts are current.
- Context-window blowup. A 6-hour autonomous run fills the model's window; the session compresses lossy, then drifts.
- Parallel session commit-dropping. Two agents on the same branch silently rebase each other's commits into oblivion.
These plugins encode the hard-won patterns that fix each of these — extracted from real overnight runs that caught real P0 errors before they reached real clients.
Neither plugin trusts the author (or the track) to self-review. A panel of 4–8 specialized reviewers runs on the deliverable — data-scientist, data-analyst, scientific-critical-thinker, client-trust-evaluator, compliance-auditor, qa-expert. A Supreme Judge arbitrates. Dependency: agent-review-panel.
Client-facing files are LOCKED by default. Modifying one requires four conditions: explicit prompt authorization, independent verification (BQ query OR second reviewer confirming), surgical-only edit, and prominent documentation in the morning summary. Without all four, flag as "REQUIRES USER DECISION."
The parent orchestrator never loads working data — reads only small status files. Each track writes state to state/status.json, state/planning_board.md, state/findings/*.md. When context pressure rises, parent dispatches a fresh successor subagent that reads state files and continues. Max 3 hops per track.
When refreshing stale content, never add an archive banner + update headlines in place. Archive the prior version (name_context.html) as a snapshot, then regenerate the active version from the current source of truth. Keeps readers oriented; passes the "would you stake your reputation on this" test.
£0 Cloud Run + £0 Cloud Build + 5 TB BQ read is the recommended envelope. Validated: entire overnight runs complete within this for most client-delivery and insight-discovery sessions. A bq_budget.py wrapper (shipped with overnight-insight-discovery) dry-runs every query, logs to JSONL, aborts on soft-cap hit.
Critical gotcha: parallel Claude sessions on the same repo can silently drop each other's commits via rebase. Use feature/session-NN-claude-A vs feature/session-NN-claude-B. Push immediately after every commit. Treat git reflog as the safety net.
# Add the marketplace
/plugin marketplace add wan-huiyan/overnight-workflows
# Install one or more plugins
/plugin install overnight-review-client-delivery@wan-huiyan-overnight-workflows
/plugin install overnight-insight-discovery@wan-huiyan-overnight-workflows
/plugin install overnight-multi-issue-implementation@wan-huiyan-overnight-workflowsOr clone directly:
git clone https://github.com/wan-huiyan/overnight-workflows.git
cp -R overnight-workflows/plugins/overnight-insight-discovery ~/.claude/skills/
cp -R overnight-workflows/plugins/overnight-review-client-delivery ~/.claude/skills/
cp -R overnight-workflows/plugins/overnight-multi-issue-implementation ~/.claude/skills/Both plugins integrate tightly with:
- agent-review-panel — REQUIRED. 16-phase review protocol with Supreme Judge + HTML dashboard.
- plan-review-integrator — Applies review findings to plans/briefs with rollback on coherence break.
planning-with-files— File-first discipline that makes successor handoff possible.claudeception— Post-run knowledge capture into updated skill versions.
Polish a client deliverable overnight (existing doc/deck):
"Run overnight-review-client-delivery on the Q4 marketing campaign report. The deliverable lives at
deliverables/campaign_impact.html. Canonical numbers are inscoping/expected_metrics.md. Locked files: the three HTMLs going to the client tomorrow. Cap: £0 cloud spend, use BQ queries only."
Discover ah-ha insights overnight (generate from data):
"Run overnight-insight-discovery on Q4 e-commerce data. Target: 2 funnel leaks + 2 surprise patterns for the exec brief on Monday. Fall campaign scope. Cap: 5 TB BQ, 8 hr wall-clock. Client = retail ops team."
Implement an issue cluster overnight (issues → stacked PRs):
"Run overnight-multi-issue-implementation on issues #437–#442 in barryu_application_propensity. Two-PR shape: hardening (#438–#441) + knowledge-gap (#437, #442). Brainstorm + plan first, then subagent-driven execution, code-review subagent before merge. I'm asleep — wake up to merged PRs and follow-up issues filed."
All three plugins will ask for scoping details (target date, known-knowns table, canonical numbers, panel personas, issue cluster, PR shape) before kicking off. Morning output: a PR with the deliverable, a morning summary flagging anything that needs your attention first, and a review-panel HTML dashboard.
- The finished deliverable (Markdown + HTML, client-ready)
- A
morning_summary.mdthat flags the ONE thing to look at first (P0 fixes to locked files, capped loops, unresolved P1s) - A review-panel HTML dashboard with per-round scores and persona-by-persona verdicts
- A PR to
mainwith a DO NOT MERGE banner (you eyeball first) - A
workflow_learnings.mdcapturing concrete recommendations for the next run - Full traceability: every commit per phase, every BQ query, every finding that DIDN'T make the brief (and why)
- These plugins are structured for 8 hour overnight windows. Sub-hour sessions are overkill; multi-day projects need further decomposition.
- They assume a BigQuery-style data warehouse with read access + at least one scratch dataset. Snowflake / Redshift / Postgres will work but the budget wrapper and SHAP compute scripts need adaptation.
- Neither plugin can execute trades, move money, push to production, or run migrations — all side-effecting actions require explicit human authorization.
overnight-insight-discoveryrequires a pre-populated cohort known-knowns table (~30 cells × top-20 features each). Without it, the novelty gate has nothing to enforce.
overnight-insight-discovery → generates the brief from scratch
↓
overnight-review-client-delivery → polishes a known-good brief into client-shape
overnight-multi-issue-implementation → ships a cluster of issues to stacked PRs
(independent track — engineering, not deliverables)
For new insight work, start with overnight-insight-discovery. For existing deliverables that just need polish + QA, go straight to overnight-review-client-delivery. For an engineering issue cluster (typically a P1 review-panel finding set), use overnight-multi-issue-implementation. The first two chain naturally; the third runs as an independent track.
For synchronous end-to-end audit of a live data dashboard (one ~30–45 min round of parallel cluster agents — different shape from the autonomous overnight runs in this repo), see wan-huiyan/dashboard-audit-toolkit. It bundles four sister skills: a parallel-cluster-agents methodology spine, the most common fix-shape produced by the audit, single-metric depth audit, and the GitHub squash-merge gotcha when shipping the fixes.
For a per-instance ML explainability fix-shape — SHAP waterfall charts in production dashboards where a post-hoc calibrator (isotonic regression, Platt scaling) sits between the raw model and the displayed score — see wan-huiyan/shap-waterfall-calibrator-skill. The "rescale to fit the chip" anti-pattern, the negative-scale-guard workaround that trades one bug for another, and the correct fix (apply the calibrator point-by-point to the cumulative-probability path so every bar lives in calibrated space).
All three plugins encode patterns from real overnight runs. overnight-review-client-delivery was validated on a causal-impact project; overnight-insight-discovery was extracted from a university-admissions propensity project; overnight-multi-issue-implementation was extracted from a 12-task chatbox-hardening + knowledge-gap session on the same admissions propensity project (2026-05-08, issues #437–#442 → 2 stacked PRs merged by morning + 5 follow-ups filed). The patterns are generalized for any project that needs autonomous overnight work with quality gates.
- v1.1.0 (2026-05-08) — Adds
overnight-multi-issue-implementationfor the engineering-side overnight pattern (issues → stacked PRs). README updated to reflect three plugins; install + compose sections expanded. - v1.0.0 (2026-04-17) — Initial release bundling two plugins.
overnight-review-client-deliverywas previously a standalone skill; this bundle adds the insight-discovery sibling and unifies the shared patterns (locked-file escape hatch, branch hygiene, file-first successor handoff, archive-and-regenerate).
MIT — see LICENSE.
Patches welcome. The shape of both plugins is still settling — if you run them in production and learn something the skills didn't catch, please open an issue or PR against the relevant SKILL.md or reference doc.
Common contribution targets:
- Additional panel personas for specific domains (medical, financial, legal)
- New yield classes for the adaptive tuning loop (insight-discovery only)
- Platform-specific adaptations of
bq_budget.py(Snowflake, Redshift, Databricks) - Alternative HTML renderers (beyond markdown2)
🤖 Patterns co-developed with Claude Code. All examples in the skills use synthetic data; no client-specific numbers in this repo.