vLLM notes

This benchmark is based on prompting models to look at some text and produce either "safe" or "unsafe" keywords to classify the given input (optionally with generating the list of violated policies). We would like to benchmark models with the vLLM inference engine instead of the default transformers backend. Since the benchmark consists of extracting logits for safe and unsafe keywords from the first generated token by an LLM, this requires transferring logits for the full vocab size over HTTP from vllm server to our client. This is prohibitively expensive (vLLM becomes 6x slower than Transformers), therefore we only transfer Top_K logits (e.g. 10). Under the assumption that model will always assign large logits to safe/unsafe keywords (it has been trained to do so), and under the GuardBench evaluation pipeline where only the relative ratio of safe/unsafe logits is used to compute F1 and Recall metrics, we are guaranteed to have exactly the same score with Top_K logits as we would have with full dictionary logits. This has been verified and battle tested across all 40 benchmarks in the GuardBench repository.

Example commands for meta-llama/Llama-Guard-4-12B

setup environment: uv pip install -r llama4guard_vllm_requirements.txt
serve the model with: vllm serve meta-llama/Llama-Guard-4-12B -tp 1 --api-key EMPTY --logprobs-mode processed_logits --max-logprobs 10 --max-model-len 131072
evaluate the model with: python vllm_server_mm_eval.py --model meta-llama/Llama-Guard-4-12B --datasets all --output_dir output_dir/Llama-Guard-4-12B --top_logprobs 10

GuardBench

🔥 News

[October 9, 2025] GuardBench now supports four additional datasets: JBB Behaviors, NicheHazardQA, HarmEval, and TechHazardQA. Also, it now allows for choosing the metrics to show at the end of the evaluation. Supported metrics are: precision (Precision), recall (Recall), f1 (F1), mcc (Matthews Correlation Coefficient), auprc (AUPRC), sensitivity (Sensitivity), specificity (Specificity), g_mean (G-Mean), fpr (False Positive Rate), fnr (False Negative Rate).

⚡️ Introduction

GuardBench is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions. GuardBench provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability. It also allows to quickly compare results and export LaTeX tables for scientific publications. GuardBench's benchmarking pipeline can also be leveraged on custom datasets.

GuardBench was featured in EMNLP 2024. The related paper is available here.

GuardBench has a public leaderboard available on HuggingFace.

You can find the list of supported datasets here. A few of them requires authorization. Please, read this.

If you use GuardBench to evaluate guardrail models for your scientific publications, please consider citing our work.

✨ Features

40 datasets for guardrail models evaluation.
Automated evaluation pipeline.
User-friendly.
Extendable.
Reproducible and sharable evaluation.
Exportable evaluation reports.

🔌 Requirements

python>=3.10

💾 Installation

pip install guardbench

💡 Usage

from guardbench import benchmark

def moderate(
    conversations: list[list[dict[str, str]]],  # MANDATORY!
    # additional `kwargs` as needed
) -> list[float]:
    # do moderation
    # return list of floats (unsafe probabilities)

benchmark(
    moderate=moderate,  # User-defined moderation function
    model_name="My Guardrail Model",
    batch_size=1,              # Default value
    datasets="all",            # Default value
    metrics=["f1", "recall"],  # Default value
    # Note: you can pass additional `kwargs` for `moderate`
)

📖 Examples

Follow our tutorial on benchmarking Llama Guard with GuardBench.
More examples are available in the scripts folder.

📚 Documentation

Browse the documentation for more details about:

The datasets and how to obtain them.
The data format used by GuardBench.
How to use the Report class to compare models and export results as LaTeX tables.
How to leverage GuardBench's benchmarking pipeline on custom datasets.

🏆 Leaderboard

You can find GuardBench's leaderboard here. If you want to submit your results, please contact us.

👨‍💻 Authors

Elias Bassani (European Commission - Joint Research Centre)

🎓 Citation

@inproceedings{guardbench,
    title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
    author = "Bassani, Elias  and
      Sanchez, Ignacio",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1022",
    doi = "10.18653/v1/2024.emnlp-main.1022",
    pages = "18393--18409",
}

🎁 Feature Requests

Would you like to see other features implemented? Please, open a feature request.

📄 License

GuardBench is provided as open-source software licensed under EUPL v1.2.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
docs		docs
guardbench		guardbench
scripts/effectiveness		scripts/effectiveness
tests/unit/guardbench		tests/unit/guardbench
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
NOTICE.txt		NOTICE.txt
README.md		README.md
_typos.toml		_typos.toml
changelog.md		changelog.md
conftest.py		conftest.py
llama4guard_vllm_requirements.txt		llama4guard_vllm_requirements.txt
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
sbom.txt		sbom.txt
setup.py		setup.py
transformers_eval.py		transformers_eval.py
transformers_mm_eval.py		transformers_mm_eval.py
vllm_server_eval.py		vllm_server_eval.py
vllm_server_mm_eval.py		vllm_server_mm_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM notes

Example commands for meta-llama/Llama-Guard-4-12B

GuardBench

🔥 News

⚡️ Introduction

✨ Features

🔌 Requirements

💾 Installation

💡 Usage

📖 Examples

📚 Documentation

🏆 Leaderboard

👨‍💻 Authors

🎓 Citation

🎁 Feature Requests

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

neuralmagic/GuardBench

Folders and files

Latest commit

History

Repository files navigation

vLLM notes

Example commands for meta-llama/Llama-Guard-4-12B

GuardBench

🔥 News

⚡️ Introduction

✨ Features

🔌 Requirements

💾 Installation

💡 Usage

📖 Examples

📚 Documentation

🏆 Leaderboard

👨‍💻 Authors

🎓 Citation

🎁 Feature Requests

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages