Skip to content

Add Hooks & Governance subsection with LLM Dark Patterns Hooks#224

Open
waitdeadai wants to merge 1 commit into
webfuse-com:mainfrom
waitdeadai:add-llm-dark-patterns
Open

Add Hooks & Governance subsection with LLM Dark Patterns Hooks#224
waitdeadai wants to merge 1 commit into
webfuse-com:mainfrom
waitdeadai:add-llm-dark-patterns

Conversation

@waitdeadai
Copy link
Copy Markdown

@waitdeadai waitdeadai commented May 11, 2026

Update 2026-05-16 — five days after this PR was opened, the suite is now at v1.0.0, with 31 wired hooks (up from 10 at PR-open / 28 at the earlier May-11 update). PR #10 merging the v1.0.0 description into main of waitdeadai/llm-dark-patterns landed today. A self-hosted marketplace is now live at github.com/waitdeadai/claude-plugins — install verified end-to-end on Claude Code v2.1.143:

claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins

The Anthropic community marketplace v1.0.0 submission is in queue today (anthropics/claude-plugins-official#1887 tracks the Published-but-not-yet-listed delay shared by many publishers). The plugin is installable today via the self-hosted route regardless. Per-hook security audit lives at SECURITY_AUDIT.md; threat model at README.md §Threat model.


Summary

Adds a new 🪝 Hooks & Governance (Community) subsection under the existing 🛠️ Claude Code & Model Context Protocol (MCP) section, with the first entry: LLM Dark Patterns Hooks.

Why a new subsection

The existing taxonomy doesn't cleanly fit community-built hook suites for runtime LLM behavioral safety:

  • Extensions & Integrations is currently scoped to IDE/Browser extensions
  • Community Curated Lists is for awesome-list aggregators
  • The Claude Code subsection is for official Anthropic resources

The new subsection makes the "runtime hook governance" lane discoverable as it grows. Happy to revise placement if you'd prefer different.

What the entry is

LLM Dark Patterns Hooks — Apache-2.0 hook suite that pattern-matches the textual signature of documented LLM dark patterns at the Claude Code Stop / SubagentStop / TaskCreated / TaskCompleted / PreToolUse lifecycle events. The judge is bash + jq — out-of-band, deterministic, and no LLM participates in the verdict path. That means prompt text inside the model context cannot directly rewrite the judge, while lexical evasion, hook misconfiguration, and runtime bypass remain explicit limitations.

Suite expanded substantially after PR opened (2026-05-11):

Metric At PR open Now (HEAD)
Hooks 10 28
Stress fixtures 168 337
Pack loader unit tests 0 17
Locale packs inline only 6 (en/es/pl/de/fr/pt)
Branches 3 (interaction-style, fact-fabrication, continuity) 6 (+ multi-agent orchestration, agentic safety, power-user polish)

The new branches map to canonical 2026 LLM safety research: DarkBench User Retention (no-wrap-up), Anthropic multi-agent blog Jun 2025 + arXiv:2604.14228 (no-aggregator-hallucination, no-silent-worker-success), gurusup May 2026 (no-handoff-loop), AgentLeak arXiv:2602.11510 (no-credential-leak), Anthropic Opus 4.6 Sabotage Risk Report (no-sandbagging-disguise).

Coverage maps to academic literature

  • DarkBench (Kran et al. 2025, ICLR 2025, arXiv:2503.10728)
  • DarkBench+ (Liu et al. 2026, AAAI 2026 main conference, ~40 LLMs across 10 categories)
  • AAAI 2026 Spring Symposium (Li et al. 2026, sycophancy at 91.7% prevalence)
  • AgentLeak (arXiv:2602.11510v2, Mar 2026) — credential leak benchmark
  • Anthropic multi-agent research (Jun 2025) — silent failure cascade
  • arXiv:2604.14228 (Apr 2026) — "silent mistakes, not crashes" is the dominant 2026 multi-agent failure mode

Receipts

  • 337-fixture stress suite, CI green at HEAD (tests/stress/)
  • 17-test pack loader unit tests
  • Apache-2.0, plugin marketplace submission to Anthropic in queue
  • Single bash file per hook (~50-150 lines), jq only dependency for most

Diff

1 file changed, 6 insertions (+), 0 deletions (-). Pure documentation addition, no breaking changes.

Adds a new "🪝 Hooks & Governance (Community)" subsection under
Claude Code & MCP, with the first entry: LLM Dark Patterns Hooks.

The subsection slot was missing — community-built hook suites for
runtime LLM behavioral safety don't fit Extensions & Integrations
(IDE/browser) or Community Curated Lists (awesome-list aggregators).
This new subsection makes that lane discoverable.

LLM Dark Patterns Hooks: 10-hook Apache-2.0 suite mapped to documented
academic literature (DarkBench, DarkPatterns-LLM, AAAI 2026 sycophancy,
ACM IUI 2025 false-memory). Out-of-band bash + jq judge, 168-fixture
stress test, plugin marketplace submission queued.

Repo: https://github.com/waitdeadai/llm-dark-patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants