AgenticRed is an automated pipeline that leverages LLMs’ in-context learning to iteratively design and refine red-teaming systems without human intervention, based on the performance metric of each system.
.
├── server/ # Scripts to start attacker, classifier, and defender model servers
├── search.sh # Main script to run the search process
├── eval.sh # Main script to run the evaluation process
└── README.md
- If you want to host your model locally, you need at least 3 GPUs (e.g., 3× L40) or equivalent capacity
- One for the attacker model server
- One for the classifier / judge model server
- One for the defender / target model server
- If you want to call API endpoints instead, we provide OpenRouter/OpenAI client API.
You can run local servers (recommended for experiments) or point to external APIs.
Inside server/, you should have scripts or configs such as:
cd server/
bash attacker_server.sh # Starts attacker model server
bash classifier_server.sh # Starts classifier / guardrail model server
bash defender_server.sh # Starts defender / target model serverDocument the actual ports and endpoints so the rest of the pipeline can reference them. Eg. http://127.0.0.1/8000/v1
Instead of local servers, you can configure the system to call:
- OpenAI APIs
- OpenRouter APIs
Update your API keys:
export OPENAI_API_KEY=''
export GEMINI_API_KEY=''
export DEEPSEEK_API_KEY=''
export OPENROUTER_API_KEY=''
The search step iteratively designs new red-teaming systems.
Typical usage:
cd _redteam
bash scripts/search.sh
e.g. bash search.sh --expr 1Key arguments (adapt to your implementation):
--expr: Experiment index corresponding to different configurations.--config: Path to a config file specifying:- Attacker, classifier, defender endpoints
- Search hyperparameters (iterations, beam width, budgets, etc.)
- Output directory for JSON logs: save_dir
- Whether to enable OpenRouter
After search completes, evaluate the discovered attacks on a separate evaluation dataset and target model / benchmark.
cd _redteam
bash scripts/eval.shKey arguments (adapt to your implementation):
- evaluator_model: Models used for evaluation
- benchmark: Currently only implemented HarmBench and StrongREJECT.
# 1. Start servers if you are hosting local servers
bash server/attacker_server.sh
bash server/defender_server.sh
bash server/classifirt_server.sh
# 2. Run search
bash _redteam/search.sh --expr 1
# 3. Run evaluation
bash _redteam/eval.sh@misc{yuan2026agenticredoptimizingagenticsystems,
title={AgenticRed: Optimizing Agentic Systems for Automated Red-teaming},
author={Jiayi Yuan and Jonathan Nöther and Natasha Jaques and Goran Radanović},
year={2026},
eprint={2601.13518},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.13518},
}