AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
-
Updated
Jun 29, 2026 - Python
AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
A curated list of awesome responsible machine learning resources.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
Safety-first guardrails for AI-driven cloud and Kubernetes operations
Agentlens is a trusted agent trading platform. Here, you can quickly find the Agent that meets your needs, and you can also publish your own Agent to turn it into your digital asset. We encourage everyone to transform their areas of expertise into Agents and turn them into digital assets, allowing others to see your unique strengths.
A resource repository for machine unlearning in large language models
Catch your AI's mistakes and blind spots before your customers or regulators do. iFixAi runs 45 inspections, 32 graded core plus 13 extended for frontier risks like sabotage, sandbagging, and oversight evasion. It returns a letter grade in under 5 minutes. Industry and model agnostic.
Deliver safe & effective language models
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
The open agent control plane. Govern autonomous AI agents with pre-execution policy enforcement, approval gates, and audit trails. Works with LangChain, CrewAI, MCP, and any framework.
Stop AI agents from doing things you didn't ask for.
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
Runtime policy enforcement for AI agents. Cryptographic audit trail, human-in-the-loop approvals, kill switch. Zero code changes.
Aligning AI With Shared Human Values (ICLR 2021)
👾 下一代透明智能体架构 | Next-Gen Transparent Agent Architecture 🔍 全行为审计 | 🛡️ 两段式安全调用 | 🧠 双水位记忆 | ⏰ 心跳任务 📊 P0 级事故率降低 80% | 兼容 OpenClaw + Claude Code 技能生态
Self-hosted, OpenAI-compatible AI gateway for private RAG, natural-language data access, and tool-calling agents.
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."