Skip to content

feat: opt-in LLM-based PDE (decision layer + rule guardrails + ghosting kill-switch + audit table)#83

Merged
enriquephl merged 18 commits into
devfrom
feat/llm-based-pde
Jun 4, 2026
Merged

feat: opt-in LLM-based PDE (decision layer + rule guardrails + ghosting kill-switch + audit table)#83
enriquephl merged 18 commits into
devfrom
feat/llm-based-pde

Conversation

@enriquephl

Copy link
Copy Markdown
Member

Summary

Adds an opt-in LLM "Persona Decision Engine" judge that decides each chat turn's action (reply_text / ghost / reserved reply_image / reply_text_image) plus a free-text inner_state folded into the reply prompt. The existing rule engine (pde::decide) is not deleted — it is demoted to a deterministic fallback (when the feature is off or the judge fails) plus a hard-safety guardrail.

  • Off by default. With no [tasks.pde_decision].filter_prompt, behaviour is byte-identical to today (the rule engine runs). The judge is wired exactly like chat_input_filter (config-driven prompt, model chain + timeout, fail-open).
  • Ghost is now an LLM decision, gated only by hard-safety vetoes (never ghost in the first 10 messages / twice in a row / within 1h cooldown). The crude score formula is ceded to the LLM.
  • ghosting = false kill-switch (default true) — a product-safety lever for downstream consumers: disables ghosting across the entire PDE path (LLM verdict, rule fallback, and the pure rule engine).
  • Reserved image actions (reply_image / reply_text_image) — first-class PDE decisions that currently degrade to reply_text; the executor (tasks.chat_image_generation) is future work.
  • inner_state is sanitized before reaching the prompt (length cap + structural-marker/control-char stripping) — prompt-injection control.
  • companion_decision_events audit table (migration 0028, modelled on companion_insights_events) — best-effort per-run telemetry of each judge call (status / proposed-vs-acted action / cost).

Bottom-up across the workspace: core (rename + ghost_permitted + plan_for) → llm (resolve_pde + pde_ghosting_enabled) → store (table + writer) → server (judge runner + wiring) → config template + docs.

Spec: docs/superpowers/specs/2026-06-04-llm-based-pde-design.md (dual-reviewed by Opus + codex over two rounds).

Test Plan

  • cargo test --workspace — 505 pass (incl. 3 live-judge E2E: ghost short-circuit, inner_state injection, fail-open)
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all --check — clean
  • PDE-OFF path verified byte-identical (existing stream tests unchanged & green)
  • codex review
  • CI green before squash-merge to dev

Notes

  • No SSE protocol / API-surface change; assistant_action_type wire strings unchanged.
  • No new replay-critical persistence (existing ghost_decision flag + stored assistant rows already make replay correct); the audit table is best-effort telemetry only.

🤖 Generated with Claude Code

enriquephl and others added 17 commits June 4, 2026 04:37
Design for replacing the rule PDE's decision role with an opt-in LLM judge
(action + free-text inner_state), keeping pde.rs as fallback + ghost guardrail.
Renames ActionType::Reply -> ReplyText, reserves ReplyImage / ReplyTextImage
(executor future), and adds a companion_decision_events audit table modelled on
companion_insights_events. Wired like chat_input_filter via [tasks.pde_decision].

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address must-fix findings from both reviewers:
- keep ActionType::Proactive (code constructs/consumes it); drop unused serde derive
- add pde::reply_plan constructor so stream never touches pde.rs privates
- run_pde_decision returns status-bearing PdeDecisionRun (feeds audit status CHECK)
- ghost_permitted: 2-arg, hard-safety layers only (score ceded to LLM), ghost_streak from Affinity
- judge runs first (ghost short-circuits vision/input_filter) + one shared history fetch
- inner_state sanitization (prompt-injection control): cap, strip section markers, tests
- explicit ReplyImage/ReplyTextImage match arms (not the Proactive _ catch-all)
- soften replay "wire-identical" -> "outcome-replayable"; audit table -> best-effort telemetry

Ratified decisions: ghost = hard-safety-only; every-turn judge + short-circuit + shared
fetch; audit = best-effort.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Product-safety lever for downstream consumers: ghosting=false (default true)
disables ghost across the ENTIRE PDE path — LLM verdict, LLM-failure rule
fallback, and the pure rule engine (LLM off) — all degrade to ReplyText. Read
independently of filter_prompt via pde_ghosting_enabled() (default true).
Enforced as a final gate on the computed plan; audit logs proposed_action=ghost
/ action=reply_text when it fires.

Also generalize the core constructor pde::reply_plan -> pde::plan_for so it
builds Ghost plans too (fixes the gap where an LLM-honoured ghost had no
constructor), and reorder the audit write after the kill-switch so it logs the
final acted action.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- (codex MAJOR) preserve the LLM's sanitized inner_state when the ghosting
  kill-switch downgrades an honoured ghost to a reply: thread `hints` in scope
  and feed the final gate, not plan.context_hints (which a plan_for(Ghost) plan
  has already dropped). Consistent with the hard-safety guardrail downgrade.
- (codex MAJOR) split §12 ghosting audit assertions by path: LLM-Ok ->
  proposed_action=ghost; LLM-failure fallback -> proposed_action=NULL;
  pure-rule/tip -> no audit row.
- (codex MINOR) §8.4: audit spawned after the final acted plan (post kill-switch).
- (opus MINOR) plan_for Proactive arm -> unreachable!() to keep the match total.
- (opus MINOR) §6 audit payload: run.verdict.as_json().or(run.raw), not the
  non-existent verdict_or_raw field.

Both reviewers: architecture sound; these were spec precision/consistency fixes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e variants

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…izer

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…switch helpers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + shared fetch)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or.rs)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…njection)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a `received_requests` assertion to
`run_stream_pde_judge_unparseable_falls_back` that verifies at least one
upstream request body contains `pde/judge` — proving the judge was
actually invoked and then failed open, not silently skipped by
`resolve_pde()` returning `None`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enriquephl

Copy link
Copy Markdown
Member Author

部署、灰度与硬 gate 说明

补充几点上线相关的说明(回答评审常见疑问)。

1. 默认部署:零行为变化

不配 [tasks.pde_decision].filter_prompt → judge 一次都不跑、零成本,可观测行为逐字节不变(走原规则引擎)。这个 PR 合进去按默认配置上线是安全的。

2. 开启后(设了 filter_prompt)什么变 / 什么不变

变 / 增强:

  • reply/ghost + 语气从「规则」变「LLM」
  • ghost 从粗糙 score 公式变 LLM 判断(受硬护栏约束),更像真人已读不回
  • [inner_state] 段从一直空着变成有 LLM 写的情绪基调
  • 每轮首 token 前多一次阻塞 judge 往返(延迟 + 成本)
  • companion_decision_events 开始有审计行

不变:

  • ghost 机制(静默、不产文字、ghost_decision 标记、affinity 惩罚)
  • SSE 协议 / wire 帧、assistant_action_type DB 字符串
  • affinity_evaluation 后处理、replay
  • ActionType::Reply→ReplyText 纯内部改名,对外无影响
  • input_filter/vision/output_filter(唯一优化:PDE 开启时 input_filter 复用 PDE 那次历史 fetch,省一次往返;PDE 关时照旧)

3. 未用到的 Proactive Action 不影响

Proactive 留在枚举里是被迫的(pde::decide 给 ProactiveTrigger/AppOpen 造它、post_process 还 match 它)。但 run_stream 的事件永远是 UserMessage,所以这条路上 pde::decide 只产 ReplyText/Ghost,verdict schema 里也没有 Proactive,guard_action 只输出 ReplyText/Ghost。plan_for(Proactive)unreachable!() 永远到不了(没有路径把 Proactive 喂给 plan_for——规则兜底走 pde::decide 直接产 plan)。结论:死但在,和本 PR 之前完全一样,不影响行为、不会 panic。

4. 建议的三段式灰度(ghosting 先关)

  1. 先上线 = PDE 整个关(不配 filter_prompt)→ 零变化、零成本,先把代码合进生产
  2. 再开 PDE + ghosting = false → LLM 只决定 reply_text + 语气,永远不会让伴侣突然沉默;在这一档验证质量 / 延迟 / 成本
  3. 最后 ghosting = true(默认即 true)→ 放开 ghost,这是真正的行为风险步骤

成本细节:judge 调用成本在「开启 PDE」(第 2 档)那刻就产生,ghosting 开不开不改变成本(judge 每轮都跑,ghosting 只决定「ghost 是否是允许的结局」)。所以「成本能否接受」在第 2 档评估;flip ghosting=true 纯粹是 UX/行为风险决定。

5. 硬 gate 全景 + PDE 的位置

一句话:LLM 提议,确定性 gate 裁决。judge 跑在最前(提议),但所有硬 gate 都在它之后对提议做「只降级、不升级」的裁决。

一个 UserMessage 轮次的顺序:

① 幂等 gate(upsert_user_message_idempotent)        ← run_stream 之外,最前;重放回放已落库结果
② 打赏 gate:tip → 强制 reply,跳过 judge            ← 确定性,绕过 PDE
③ PDE judge(提议 action + inner_state)             ← 新增 LLM 步,决策里跑最前
④ guard_action:硬安全 ghost 否决 + 图片降级          ← 确定性,在 judge 之后
     · 新关系<10条 / 连续2次 / 1h冷却 → 否决 ghost→reply
     · reply_image / reply_text_image → reply_text(执行器未上)
⑤ ghosting kill-switch(config)                     ← 确定性,最后一道,整条路径
   ├─ 最终 ghost → ghost 臂(静默,无后续 gate)
   └─ reply ↓
⑥ vision describe → input_filter 改写               ← 只在 reply 臂
⑦ build_reply_request:allow_traits/tier 门控 + prompt 组装(含 inner_state + 铁律)
⑧ chat 生成 →(可选)output_filter 改写             ← 内容安全在最末端

PDE 在硬 gate 前还是后? judge 在硬 gate 之前(它是提议者),硬 gate 在 judge 之后裁决它:

  • ②打赏、①幂等:在 judge 之前(打赏直接绕过 judge)
  • ④硬安全 ghost 否决、⑤ghosting、图片降级:在 judge 之后——judge 提议 ghost,这些 gate 能砍回 reply,但永远不能反向把 reply 升成 ghost
  • ⑦allow_traits/tier 门控、铁律(无未成年/无自残等)、⑧output_filter:都在更下游(只在 reply 臂、生成时),PDE 完全不碰

结论:新增 PDE 没削弱任何已有硬 gate——它插在「reply 还是 ghost」这一层,把那层的粗糙 score 换成 LLM + 硬安全护栏;下游的内容/权限/铁律 gate 原封不动排在它后面。


🤖 Generated with Claude Code

…in cursor)

Per codex review P3: resolve_pde() advances the round-robin model cursor as a
side effect, but tip turns skip the judge — resolving on a skipped turn skewed
model selection for later non-tip judge calls. Gate resolution behind !is_tip.
Behaviour-preserving (tip already takes the rule path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enriquephl

Copy link
Copy Markdown
Member Author

codex review 跑完:仅一个 P3,无 P0/P1/P2、无 blocker,总评「实现基本没问题、主行为自洽」。

已修(5742083):[P3] tip 轮跳过 judge 时仍调用 resolve_pde(),会空推进 round-robin 模型游标,扰动后续真正 judge 调用的选模型。改为 let resolved_pde = if is_tip { None } else { state.model_config.resolve_pde() };,行为保持(tip 本来就走规则路径),只是不再碰游标。仅影响 pde_decision.model 配成数组的场景。

fmt / clippy -D warnings / 3 个 PDE judge E2E + tip 测试均绿。

🤖 Generated with Claude Code

@enriquephl enriquephl merged commit 5f5271b into dev Jun 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant