Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
-
Updated
Dec 24, 2025 - Python
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
🔍 Investigate LLM agent jailbreaking using a dual-agent framework to analyze persuasive strategies and model resistance in a controlled environment.
🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.
Chain-of-thought hijacking via template token injection for LLM censorship bypass (GPT-OSS)
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
The Self-Hosted AI Firewall & Gateway. Drop-in guardrails for LLMs running entirely on CPU. Blocks jailbreaks, enforces policies, and ensures compliance in real-time
Debugged version for Tree of Attacks: Jailbreaking Black-Box LLMs Automatically paper and added GPU optimization.
Add a description, image, and links to the llm-jailbreaks topic page so that developers can more easily learn about it.
To associate your repository with the llm-jailbreaks topic, visit your repo's landing page and select "manage topics."