Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
-
Updated
May 29, 2026 - Python
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Deterministic safety solutions for probabilistic AI agents
Introducing XSafeClaw: The Open-Source Agent Safety Platform from Fudan University
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition, Planning, Interaction, Agentic System
Ethicore Engine™ is an AI safety, ethics, and compliance platform. This repo consists of the open-source components of Ethicore Engine™ - Guardian SDK; designed to protect your AI applications from prompt injection, jailbreaks, role hijacking, system-prompt extraction, and 100+ additional threat categories through a multi-layer analysis pipeline
Practices, protocols, and skills for AI-driven software development. 18 skills + 1 Bash safety hook for Claude Code, Codex CLI, OpenCode, Cursor, Gemini CLI, Antigravity, and any agent supporting the Agent Skills standard.
An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.
Fast local Rust scanner for AI-agent prompt injection, credential leaks, exfiltration, and risky tool calls
Trust nothing. Ship safely. — Skeptical-reading and prompt-injection defense skill for AI agents. Provenance tagging, red-flag patterns, refusal templates, and a read-only injection auditor. MIT.
Claude Code agent-in-container orchestration and automation
Human-in-the-loop execution for LLM agents
OpenClaw-compatible MASL safety gate with public RAG packs for memory-aware AI agents
The open standard for runtime agent control — declarative hooks, policy enforcement, and observability across AI agent frameworks.
Security scanner for AI agent tool definitions
Guardrails service for AI agents. Default-deny tool call evaluation with LLM safety analysis, priority-ordered decision matrix, and human-in-the-loop escalations. Session recording, behavioral analysis, MCP proxy, secret redaction, and real-time audit.
The open-source safety layer for AI agents — block unsafe tool calls, require approval, enforce budgets, audit, replay.
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
Deterministic execution authorization for AI agents
🛡️ Safe AI Agents through Action Classifier
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."