feat(threat-detect): injection/exfil detection with severity tiers + entropy guard (#156)#180
Merged
Merged
Conversation
…entropy guard (#156) Add a SEPARATE content-threat detection layer beside scrub() (write-safety.ts is untouched: SECRET_PATTERNS and scrub() unchanged). Canonical, self-contained hooks/lib/threat-detect.ts re-exported via src/lib/threat-detect.ts (mirrors the #50/#157 seam) — one pattern source consumed by CLI, MCP, and hooks. Precision is earned structurally (this is the #137-class false-positive minefield): all 11 injection/exfil regexes are flag-tier ONLY, so an over-matching regex can never mutate or block a legit note — at most a benign annotation. The only mutating tier is redact (anchorless high-entropy token, closing the PR #128 residual); the only blocking action is redact-tier on the explicit memory_add path. - 11 flag-tier regexes: instruction-override, role-hijack, env-exfil, dotenv/netrc/ssh reads. File-read patterns require a shell verb governing the path and env-exfil requires curl/wget + a secret-named interpolation, so prose mentioning ~/.ssh, a .env file, or DB_URL stays clean. - Anchorless high-entropy token (redact-tier) via a layered FP-guard chain: exclude +/= (base64/data-URI), length 28-72, pure-hex (git SHA/md5/sha256), UUID, non-3-class / digit-sparse strings, word-structured identifiers (long single-case letter run), then a Shannon-entropy floor. The threshold is pinned by MEASURED fixture values in the test corpus, not a magic number. - Per-path policy: session + import-legacy redact (never block — no block-storms); memory_add blocks a suspected bare credential with a fixable error. - Wired AFTER scrub() on the two scrub paths; memory_add wired so the block tier works (memory_add still lacks scrub() — out of scope for #156, follow-up). Tests: tests/hooks/threat-detect.test.ts (TP + must-not-block corpus, measured entropy assertions, per-path policy, flag-is-byte-identical) and tests/lib/threat-detect-single-source.test.ts (src re-export === hooks canonical).
…ate/block (#156) RedTeam NO-GO + Ed's ruling: you cannot distinguish a random SECRET from a random PUBLIC identifier by entropy + shape — base64url (incl. the universal JWT header eyJhbGciOi…), base58 IPFS CIDs, and long nanoids all look identical to an anonymous secret. Redacting/blocking on that signal silently corrupts public, non-secret content. So the entire threat-detect layer becomes DETECT-AND-SURFACE: it never mutates persisted text and never blocks a write. Mostly deletion: - Remove the redact tier (span replacement / [THREAT-REDACTED] marker) and the block tier (write rejection) entirely. scanForThreats, resolveAction, the per-path policy matrix, and the IngestionPath/ResolvedThreat/ThreatScanResult/ ThreatAction/ThreatSeverity types are gone. - KEEP detection unchanged: detectThreats(text) still returns flag findings (category + span) for the 11 injection/exfil regexes AND anonymous high-entropy tokens. The entropy structural guards remain as flag-noise reduction (not a safety boundary now). - Surfacing per path (no content injection): extract-core returns findings in result.threats; import-legacy logs a non-fatal [WARN]; memory_add logs a non-fatal stderr line. Every persisted record is byte-identical on every path. Accepted tradeoff (documented in the module header): anonymous high-entropy secrets are FLAGGED, not redacted; anchored known-prefix secrets remain handled by scrub() on the scrub paths (unchanged). scrub()/SECRET_PATTERNS untouched. Tests: base64url token reclassified to flag-and-persist-unchanged; added the literal JWT header, a base58 IPFS CIDv0, and a length-31 nanoid as flag-only/persist-byte-identical cases; locked the deletion (no scanForThreats / no redact-block surface). bun test green, lint + build clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #156. A detect-and-surface content-threat layer beside
scrub()—SECRET_PATTERNS/scrub()are untouched. Canonicalhooks/lib/threat-detect.tsre-exported viasrc/lib/threat-detect.ts(mirrors the #50/#157 seam — one pattern source for CLI + MCP + hooks).Ruling: FLAG ONLY — never mutates, never blocks (RedTeam NO-GO + Ed)
The original entropy tier redacted/blocked anonymous high-entropy tokens. RedTeam proved this corrupts public, non-secret tokens — you can't tell a random secret from a random public ID by entropy + shape (base64url incl. the universal JWT header
eyJhbGciOi…, base58 IPFS CIDs, long nanoids all look identical). Ed ruled: demote to flag everywhere. The layer now only surfaces findings; every persisted record is byte-identical on every path, and no write is ever blocked. (Net: mostly deletion — −308/+216.)What it does
KEY/TOKEN/SECRET-named interpolation — so prose mentioning~/.ssh, a.envfile, orDB_URLstays clean.+/=/base64, length 28–72, pure-hex SHAs, UUID, 3-class + ≥3 digits, letter-run, Shannon ≥4.0) are now flag-noise reduction, not a safety boundary.Surfacing per path (no content injection)
extract-core): findings inresult.threats[WARN]logPrecision is now total
Nothing mutates or blocks, so a false positive is at worst a spurious flag. Verified by tests:
~/.ssh/.env/DB_URL/ quoted"ignore previous instructions"→ clean or flag, never altered{category, span}, no action/replacement text) → persist byte-identical, never blockscanForThreats/ no redact-block surface is exportedAccepted tradeoff (module header)
Anonymous high-entropy secrets are flagged, not redacted; anchored known-prefix secrets remain
scrub()'s job on the scrub paths (unchanged).Gate
bun run lintclean ·bun run buildclean ·bun test1164 pass / 0 fail (63 new). Measured-entropy assertions pin the detection boundary.Flagged out-of-scope gap
memory_adddoes not callscrub()today — out of scope for #156 (#51 territory); recommend a follow-up issue.