UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
-
Updated
Sep 28, 2025 - Python
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
DeepTeam is a framework to red team LLMs and LLM systems.
Decrypted Generative Model safety files for Apple Intelligence containing filters
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
Attack to induce LLMs within hallucinations
Papers about red teaming LLMs and Multimodal models.
Reading list for adversarial perspective and robustness in deep reinforcement learning.
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Papers from our SoK on Red-Teaming (Accepted at TMLR)
NeurIPS'24 - LLM Safety Landscape
Restore safety in fine-tuned language models through task arithmetic
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.
Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers"
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
Some Thoughts on AI Alignment: Using AI to Control AI
DSPy framework for detecting and preventing safety override cascades in LLM systems. Research-grade implementation for studying when completion urgency overrides safety constraints.
[COLM 2025] LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Source code for "Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport", ACL 2025
Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.
To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."