llm-jailbreaks

Here are 11 public repositories matching this topic...

msoedov / agentic_security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

agent-framework ai-red-team prompt-testing llm-security llm-vulnerabilities llm-evaluation llm-fuzzing llm-evaluation-framework llm-guardrails llm-scanner llm-jailbreaks llm-fuzzer llm-fuzzer-aggregator agent-security

Updated Dec 24, 2025
Python

CryptoAILab / JailbreakEval

Star

[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.

llm-safety llm-jailbreaks

Updated Apr 1, 2025
Python

whitecircle-ai / circle-guard-bench

Star

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

benchmarking benchmark ai jailbreak safeguard guardrail guardrails large-language-models llm large-language-model llm-security llm-eval llm-evaluation llm-as-a-judge llm-jailbreaks

Updated Dec 3, 2025
Python

UCSB-NLP-Chang / SemanticSmooth

Star

Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'

large-language-models llm-jailbreaks

Updated Jun 9, 2024
Python

yiksiu-chan / SpeakEasy

Star

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

machine-learning ai-safety large-language-models llm-jailbreaks

Updated Sep 27, 2025
Python

DifABD / Agent-Jailbreaking-Agents

Star

🔍 Investigate LLM agent jailbreaking using a dual-agent framework to analyze persuasive strategies and model resistance in a controlled environment.

sqlalchemy feedback jailbreak malware loop poc gpt messages lineage working ai-agents adversarial-attacks chatgpt llama-cpp chatgpt-jailbreak gpt5 llm-jailbreaks llm-malware-generation

Updated Jan 8, 2026
Python

RafaelParonis / jailbench

Star

🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.

python flask analytics openai alignment model-evaluation ai-safety security-testing red-teaming model-robustness anthropic litellm content-safety llm-jailbreaks tool-calling llm-benchmark ai-evals textual-tui

Updated Jan 8, 2026
Python

4n4s4zi / llm-jailbreaking

Star

Chain-of-thought hijacking via template token injection for LLM censorship bypass (GPT-OSS)

llm-jailbreaks gpt-oss

Updated Sep 27, 2025
Python

vibheksoni / jailbench

Star

Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.

Updated Aug 12, 2025
Python

lorenzomaiuri-dev / svalinn-ai

Star

The Self-Hosted AI Firewall & Gateway. Drop-in guardrails for LLMs running entirely on CPU. Blocks jailbreaks, enforces policies, and ensures compliance in real-time

Updated Jan 6, 2026
Python

1lmao / TAP-Tree-of-Attacks-with-Pruning

Star

Debugged version for Tree of Attacks: Jailbreaking Black-Box LLMs Automatically paper and added GPU optimization.

llm-jailbreaks

Updated Nov 24, 2025
Python

Improve this page

Add a description, image, and links to the llm-jailbreaks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-jailbreaks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-jailbreaks

Here are 11 public repositories matching this topic...

msoedov / agentic_security

CryptoAILab / JailbreakEval

whitecircle-ai / circle-guard-bench

UCSB-NLP-Chang / SemanticSmooth

yiksiu-chan / SpeakEasy

DifABD / Agent-Jailbreaking-Agents

RafaelParonis / jailbench

4n4s4zi / llm-jailbreaking

vibheksoni / jailbench

lorenzomaiuri-dev / svalinn-ai

1lmao / TAP-Tree-of-Attacks-with-Pruning

Improve this page

Add this topic to your repo