feat(threat-detect): injection/exfil detection with severity tiers + entropy guard (#156) by edheltzel · Pull Request #180 · edheltzel/Recall

edheltzel · 2026-06-24T15:01:26Z

Closes #156. A detect-and-surface content-threat layer beside scrub() — SECRET_PATTERNS / scrub() are untouched. Canonical hooks/lib/threat-detect.ts re-exported via src/lib/threat-detect.ts (mirrors the #50/#157 seam — one pattern source for CLI + MCP + hooks).

Ruling: FLAG ONLY — never mutates, never blocks (RedTeam NO-GO + Ed)

The original entropy tier redacted/blocked anonymous high-entropy tokens. RedTeam proved this corrupts public, non-secret tokens — you can't tell a random secret from a random public ID by entropy + shape (base64url incl. the universal JWT header eyJhbGciOi…, base58 IPFS CIDs, long nanoids all look identical). Ed ruled: demote to flag everywhere. The layer now only surfaces findings; every persisted record is byte-identical on every path, and no write is ever blocked. (Net: mostly deletion — −308/+216.)

What it does

11 injection/exfil regexes (instruction-override ×3, role-hijack ×3, env-exfil ×2, dotenv/netrc/ssh reads ×3) → flag findings. File-read patterns require a shell verb governing the path; env-exfil requires curl/wget + a KEY/TOKEN/SECRET-named interpolation — so prose mentioning ~/.ssh, a .env file, or DB_URL stays clean.
Anonymous high-entropy token → flag finding (closes the feat(hooks): minimal write-safety guard — Part of #51 (P0) #128 residual as a surfaced signal, not a redaction). Structural guards (exclude +/=/base64, length 28–72, pure-hex SHAs, UUID, 3-class + ≥3 digits, letter-run, Shannon ≥4.0) are now flag-noise reduction, not a safety boundary.

Surfacing per path (no content injection)

session (extract-core): findings in result.threats
import-legacy (CLI): non-fatal [WARN] log
memory_add (MCP): non-fatal stderr log; the record is stored exactly as provided (no block)

Precision is now total

Nothing mutates or blocks, so a false positive is at worst a spurious flag. Verified by tests:

~/.ssh / .env / DB_URL / quoted "ignore previous instructions" → clean or flag, never altered
git SHA / md5 / sha256 / UUID / base64 image blob / camelCase identifier → not even flagged (noise reduction)
PUBLIC tokens — universal JWT header, base58 IPFS CIDv0, length-31 nanoid, base64url — are flagged as pure flags (only {category, span}, no action/replacement text) → persist byte-identical, never block
deletion locked: no scanForThreats / no redact-block surface is exported

Accepted tradeoff (module header)

Anonymous high-entropy secrets are flagged, not redacted; anchored known-prefix secrets remain scrub()'s job on the scrub paths (unchanged).

Gate

bun run lint clean · bun run build clean · bun test 1164 pass / 0 fail (63 new). Measured-entropy assertions pin the detection boundary.

Flagged out-of-scope gap

memory_add does not call scrub() today — out of scope for #156 (#51 territory); recommend a follow-up issue.

Do not merge — awaiting Themis's re-attack.

…entropy guard (#156) Add a SEPARATE content-threat detection layer beside scrub() (write-safety.ts is untouched: SECRET_PATTERNS and scrub() unchanged). Canonical, self-contained hooks/lib/threat-detect.ts re-exported via src/lib/threat-detect.ts (mirrors the #50/#157 seam) — one pattern source consumed by CLI, MCP, and hooks. Precision is earned structurally (this is the #137-class false-positive minefield): all 11 injection/exfil regexes are flag-tier ONLY, so an over-matching regex can never mutate or block a legit note — at most a benign annotation. The only mutating tier is redact (anchorless high-entropy token, closing the PR #128 residual); the only blocking action is redact-tier on the explicit memory_add path. - 11 flag-tier regexes: instruction-override, role-hijack, env-exfil, dotenv/netrc/ssh reads. File-read patterns require a shell verb governing the path and env-exfil requires curl/wget + a secret-named interpolation, so prose mentioning ~/.ssh, a .env file, or DB_URL stays clean. - Anchorless high-entropy token (redact-tier) via a layered FP-guard chain: exclude +/= (base64/data-URI), length 28-72, pure-hex (git SHA/md5/sha256), UUID, non-3-class / digit-sparse strings, word-structured identifiers (long single-case letter run), then a Shannon-entropy floor. The threshold is pinned by MEASURED fixture values in the test corpus, not a magic number. - Per-path policy: session + import-legacy redact (never block — no block-storms); memory_add blocks a suspected bare credential with a fixable error. - Wired AFTER scrub() on the two scrub paths; memory_add wired so the block tier works (memory_add still lacks scrub() — out of scope for #156, follow-up). Tests: tests/hooks/threat-detect.test.ts (TP + must-not-block corpus, measured entropy assertions, per-path policy, flag-is-byte-identical) and tests/lib/threat-detect-single-source.test.ts (src re-export === hooks canonical).

…ate/block (#156) RedTeam NO-GO + Ed's ruling: you cannot distinguish a random SECRET from a random PUBLIC identifier by entropy + shape — base64url (incl. the universal JWT header eyJhbGciOi…), base58 IPFS CIDs, and long nanoids all look identical to an anonymous secret. Redacting/blocking on that signal silently corrupts public, non-secret content. So the entire threat-detect layer becomes DETECT-AND-SURFACE: it never mutates persisted text and never blocks a write. Mostly deletion: - Remove the redact tier (span replacement / [THREAT-REDACTED] marker) and the block tier (write rejection) entirely. scanForThreats, resolveAction, the per-path policy matrix, and the IngestionPath/ResolvedThreat/ThreatScanResult/ ThreatAction/ThreatSeverity types are gone. - KEEP detection unchanged: detectThreats(text) still returns flag findings (category + span) for the 11 injection/exfil regexes AND anonymous high-entropy tokens. The entropy structural guards remain as flag-noise reduction (not a safety boundary now). - Surfacing per path (no content injection): extract-core returns findings in result.threats; import-legacy logs a non-fatal [WARN]; memory_add logs a non-fatal stderr line. Every persisted record is byte-identical on every path. Accepted tradeoff (documented in the module header): anonymous high-entropy secrets are FLAGGED, not redacted; anchored known-prefix secrets remain handled by scrub() on the scrub paths (unchanged). scrub()/SECRET_PATTERNS untouched. Tests: base64url token reclassified to flag-and-persist-unchanged; added the literal JWT header, a base58 IPFS CIDv0, and a length-31 nanoid as flag-only/persist-byte-identical cases; locked the deletion (no scanForThreats / no redact-block surface). bun test green, lint + build clean.

edheltzel added 2 commits June 24, 2026 11:01

edheltzel merged commit 84f0b0a into main Jun 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(threat-detect): injection/exfil detection with severity tiers + entropy guard (#156)#180

feat(threat-detect): injection/exfil detection with severity tiers + entropy guard (#156)#180
edheltzel merged 2 commits into
mainfrom
worktree-feat+156-injection-exfil-detection

edheltzel commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edheltzel commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ruling: FLAG ONLY — never mutates, never blocks (RedTeam NO-GO + Ed)

What it does

Surfacing per path (no content injection)

Precision is now total

Accepted tradeoff (module header)

Gate

Flagged out-of-scope gap

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edheltzel commented Jun 24, 2026 •

edited

Loading