FIDES: deterministic prompt-injection defense, coming soon to Agent Framework #5624
Replies: 1 comment 1 reply
-
|
This is a strong direction. Deterministic information-flow control is exactly the kind of defense agent frameworks need because prompt-only defenses leave too much to model behavior. On the feedback points:
The main evidence I would want from the framework is a compact decision record: input labels, propagated labels, policy checked, decision, tool/action name, and reason. That makes the defense usable for audit, incident review, and regression testing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all 👋
FIDES (Flow Integrity Deterministic Enforcement System) is landing in
agent-framework-core(as of v1.3.0) underagent_framework.security— an information-flow-control defense against prompt injection and data exfiltration. It's based on the FIDES paper by Costa et al. and tracked in ADR-0024.The problem in one screenshot
A triage agent that reads issues and can call
read_file/post_comment. An attacker opens a normal-looking bug report with a footer:A defensive system prompt lowers the success rate of known attacks. It doesn't make the next one impossible. The agent only has to be wrong once.
What FIDES does
Content: integrity (trusted/untrusted) and confidentiality (public/private/user_identity).additional_properties(accepts_untrusted=False,max_allowed_confidentiality="public").Opt-in via a single
SecureAgentConfigcontext provider on your existingAgent.Try it
python/samples/02-agents/security/—email_security_example.py(prompt injection) andrepo_confidentiality_example.py(data exfiltration).What we'd like feedback on
additional_propertiesdecorator approach feel right, or would you prefer a dedicatedSecurityLabelargument on@tool?SecureAgentConfigkeeps existing agents unchanged; should some labels apply automatically (e.g. tool results from MCP servers beinguntrustedby default)?Co-authored with @shrutitople.
Beta Was this translation helpful? Give feedback.
All reactions