feat: opt-in LLM-based PDE (decision layer + rule guardrails + ghosting kill-switch + audit table)#83
Conversation
Design for replacing the rule PDE's decision role with an opt-in LLM judge (action + free-text inner_state), keeping pde.rs as fallback + ghost guardrail. Renames ActionType::Reply -> ReplyText, reserves ReplyImage / ReplyTextImage (executor future), and adds a companion_decision_events audit table modelled on companion_insights_events. Wired like chat_input_filter via [tasks.pde_decision]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address must-fix findings from both reviewers: - keep ActionType::Proactive (code constructs/consumes it); drop unused serde derive - add pde::reply_plan constructor so stream never touches pde.rs privates - run_pde_decision returns status-bearing PdeDecisionRun (feeds audit status CHECK) - ghost_permitted: 2-arg, hard-safety layers only (score ceded to LLM), ghost_streak from Affinity - judge runs first (ghost short-circuits vision/input_filter) + one shared history fetch - inner_state sanitization (prompt-injection control): cap, strip section markers, tests - explicit ReplyImage/ReplyTextImage match arms (not the Proactive _ catch-all) - soften replay "wire-identical" -> "outcome-replayable"; audit table -> best-effort telemetry Ratified decisions: ghost = hard-safety-only; every-turn judge + short-circuit + shared fetch; audit = best-effort. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Product-safety lever for downstream consumers: ghosting=false (default true) disables ghost across the ENTIRE PDE path — LLM verdict, LLM-failure rule fallback, and the pure rule engine (LLM off) — all degrade to ReplyText. Read independently of filter_prompt via pde_ghosting_enabled() (default true). Enforced as a final gate on the computed plan; audit logs proposed_action=ghost / action=reply_text when it fires. Also generalize the core constructor pde::reply_plan -> pde::plan_for so it builds Ghost plans too (fixes the gap where an LLM-honoured ghost had no constructor), and reorder the audit write after the kill-switch so it logs the final acted action. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- (codex MAJOR) preserve the LLM's sanitized inner_state when the ghosting kill-switch downgrades an honoured ghost to a reply: thread `hints` in scope and feed the final gate, not plan.context_hints (which a plan_for(Ghost) plan has already dropped). Consistent with the hard-safety guardrail downgrade. - (codex MAJOR) split §12 ghosting audit assertions by path: LLM-Ok -> proposed_action=ghost; LLM-failure fallback -> proposed_action=NULL; pure-rule/tip -> no audit row. - (codex MINOR) §8.4: audit spawned after the final acted plan (post kill-switch). - (opus MINOR) plan_for Proactive arm -> unreachable!() to keep the match total. - (opus MINOR) §6 audit payload: run.verdict.as_json().or(run.raw), not the non-existent verdict_or_raw field. Both reviewers: architecture sound; these were spec precision/consistency fixes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e variants Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…izer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…switch helpers Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + shared fetch) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or.rs) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…njection) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a `received_requests` assertion to `run_stream_pde_judge_unparseable_falls_back` that verifies at least one upstream request body contains `pde/judge` — proving the judge was actually invoked and then failed open, not silently skipped by `resolve_pde()` returning `None`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
部署、灰度与硬 gate 说明补充几点上线相关的说明(回答评审常见疑问)。 1. 默认部署:零行为变化不配 2. 开启后(设了
|
…in cursor) Per codex review P3: resolve_pde() advances the round-robin model cursor as a side effect, but tip turns skip the judge — resolving on a skipped turn skewed model selection for later non-tip judge calls. Gate resolution behind !is_tip. Behaviour-preserving (tip already takes the rule path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
codex review 跑完:仅一个 P3,无 P0/P1/P2、无 blocker,总评「实现基本没问题、主行为自洽」。 已修( fmt / clippy 🤖 Generated with Claude Code |
Summary
Adds an opt-in LLM "Persona Decision Engine" judge that decides each chat turn's action (
reply_text/ghost/ reservedreply_image/reply_text_image) plus a free-textinner_statefolded into the reply prompt. The existing rule engine (pde::decide) is not deleted — it is demoted to a deterministic fallback (when the feature is off or the judge fails) plus a hard-safety guardrail.[tasks.pde_decision].filter_prompt, behaviour is byte-identical to today (the rule engine runs). The judge is wired exactly likechat_input_filter(config-driven prompt, model chain + timeout, fail-open).ghosting = falsekill-switch (defaulttrue) — a product-safety lever for downstream consumers: disables ghosting across the entire PDE path (LLM verdict, rule fallback, and the pure rule engine).reply_image/reply_text_image) — first-class PDE decisions that currently degrade toreply_text; the executor (tasks.chat_image_generation) is future work.inner_stateis sanitized before reaching the prompt (length cap + structural-marker/control-char stripping) — prompt-injection control.companion_decision_eventsaudit table (migration0028, modelled oncompanion_insights_events) — best-effort per-run telemetry of each judge call (status / proposed-vs-acted action / cost).Bottom-up across the workspace:
core(rename +ghost_permitted+plan_for) →llm(resolve_pde+pde_ghosting_enabled) →store(table + writer) →server(judge runner + wiring) → config template + docs.Spec:
docs/superpowers/specs/2026-06-04-llm-based-pde-design.md(dual-reviewed by Opus + codex over two rounds).Test Plan
cargo test --workspace— 505 pass (incl. 3 live-judge E2E: ghost short-circuit,inner_stateinjection, fail-open)cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --all --check— cleandevNotes
assistant_action_typewire strings unchanged.ghost_decisionflag + stored assistant rows already make replay correct); the audit table is best-effort telemetry only.🤖 Generated with Claude Code