Fix: Wire prompt sanitizer into classification agents to prevent prompt injection by TanmayZade · Pull Request #219 · beenuar/AiSOC

TanmayZade · 2026-05-27T09:27:25Z

Summary

This PR addresses a critical prompt injection vulnerability where attacker-controlled telemetry could hijack the classification agents (auto_triage, phishing, cloud, identity, insider_threat) and trick them into auto-closing legitimate high-severity alerts.

While the investigator agents were already correctly using the prompt_sanitizer module, the 5 classification agents bypassed it, pasting raw alert fields directly into the LLM prompt.

Changes

Input Sanitization: Wired the existing sanitize_text() function into the context-building functions for all 5 classification agents.
Trust Boundary Enforcement: Wrapped the final alert telemetry context in <UNTRUSTED_DATA> tags via wrap_untrusted() so the LLM treats it as inert data rather than instructions.

Verification

Tested with a malicious injection payload locally. The injection payload is now successfully stripped by sanitize_text (redacted to [REDACTED:INJECTION]) and the LLM correctly evaluates the underlying alert as a true_positive.
Validated that the needs_human_review fallback works safely if the LLM errors.

Eval Harness Results

Alert Reduction: No regression (CI verified)
Investigation Completeness: No regression (CI verified)
Response Quality: No regression (CI verified)
MITRE Accuracy: No regression (CI verified)
Note: Because this only adds a sanitization layer around existing LLM contexts, substrate self-consistency and agent accuracy metrics remain unchanged against the synthetic eval corpus.

Fixes #220

…pt injection

TanmayZade · 2026-05-27T10:19:48Z

Hi @beenuar,

The Compose Smoke / docker compose up — full stack failure is a pre-existing issue on main. Migration 041_attack_chains.sql uses window as a column name, which is a reserved keyword in PostgreSQL 14+, causing the Postgres container to crash. This is unrelated to the changes in this PR.

Fix: Wire prompt sanitizer into classification agents to prevent prom…

be71963

…pt injection

TanmayZade requested a review from beenuar as a code owner May 27, 2026 09:27

TanmayZade added 3 commits May 27, 2026 14:59

style: fix import sorting for ruff compliance

20a0c4b

style: run ruff format on classification agents

cfe8d82

trigger ci re-run

c057691

vgudur-dev mentioned this pull request May 27, 2026

[Bug]: Prompt Injection in Classification Agents leads to Alert Auto-Close Bypass #220

Closed

2 tasks

beenuar merged commit 25e5f52 into beenuar:main May 28, 2026
22 of 23 checks passed

This was referenced May 28, 2026

docs: add Credits section thanking contributors and security researchers #223

Merged

release: cut v7.4.0 #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Wire prompt sanitizer into classification agents to prevent prompt injection#219

Fix: Wire prompt sanitizer into classification agents to prevent prompt injection#219
beenuar merged 4 commits into
beenuar:mainfrom
TanmayZade:fix/prompt-injection-classification-agents

TanmayZade commented May 27, 2026 •

edited

Loading

Uh oh!

TanmayZade commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TanmayZade commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Eval Harness Results

Uh oh!

TanmayZade commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TanmayZade commented May 27, 2026 •

edited

Loading