This is the repository for the paper GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning.
This is an LLM agent designed to act as a guardrail for other LLM agents. Specifically, GuardAgent oversees a target LLM agent by checking whether its inputs/outputs satisfy a set of given guard requests (e.g., safety rules or privacy policies) defined by the users.
You can find our project page here.
We recommend running our code in an environment with python>=3.9
. Please run
pip install -r requirements.txt
to install necessary packages.
Please update your API key in ./config.py
.
We support the protection of the EhrAgent based on EICU-AC and the SeeAct based on Mind2Web-SC, and we provide the corresponding dataset. Please refer to this link to learn more about and download dataset。
You can run the following command to use our GuardAgent:
python main.py --llm YOUR_LLM --agent TYOUR_AGENT --seed YOUR_RANDOM_SEED --num_shots YOUR_NUM_SHOTS --logs_path YOUR_LOGS_PATH --dataset_path YOUR_DATASET_PATH
Follows are explanations of each parameter:
--llm
: The backbone LLM of the GuardAgent you wish to use. You need to set the corresponding API key in ./config.py
.
--agent
: The LLM Agent you wish to protect. Currently, there are two options: ehragent
and seeact
.
--seed
: The seed used for random shuffle.
--num_shots
: The number of shots used by the GuardAgent when generating code. You can choose 1, 2, or 3.
--logs_path
: The output path for GuardAgent.
--dataset_path
: The path to the dataset that the GuardAgent accepts as input. You should set this parameter to the appropriate path after downloading the dataset (see Download Dataset).
Alternatively, you can set the default values for these options in ./main.py
and then run python main.py
.
If this repository is helpful to you, please consider citing:
@misc{xiang2024guardagent,
title={GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning},
author={Zhen Xiang and Linzhi Zheng and Yanjie Li and Junyuan Hong and Qinbin Li and Han Xie and Jiawei Zhang and Zidi Xiong and Chulin Xie and Carl Yang and Dawn Song and Bo Li},
year={2024},
eprint={2406.09187},
archivePrefix={arXiv}}