Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning
WebFilter, a novel agentic RAG framework that generates source-restricted queries and filters out unreliable content. This approach combines a retrieval filtering mechanism with a behavior- and outcome-driven reward strategy, optimizing both query formulation and retrieval outcomes.
- [2025.11.9] 🎉 WebFilter-7B has been accepted to AAAI 2026!
- [2025.8.25] 🎉 WebFilter-7B is now live on Hugging Face Hub.
- [2025.8.12] 🎉 Our code repository is now open to everyone!
- [2025.8.12] 🎉 WebFilter is live on arXiv — check it out!
Extensive experiments demonstrate that WebFilter hieves state-of-the-art QA performance, with advanced search operator usage rising from 10% to 75%, and significant gains across in-domain and out-of-domain benchmarks.
Furthermore, case studies demonstrate WebFilter’s capacity to reason about search scope, cross-verify information across diverse sources, and adapt its strategies to improve retrieval effectiveness.
In addition to the intuitive performance gains across a wide range of benchmarks, WebFilter effectively encourages the agent to use advanced query syntax with significantly increased frequency during the search process.
WebFilter is now live on Hugging Face Hub — you’re welcome to explore it anytime:
| Model Name | HF Checkpoint | Size |
|---|---|---|
| WebFilter-7b | 🤗 dayll/WebFilter-7B | 7B |
To begin using this repo, you need to install the required dependencies. You can do this by running the following command:
git clone https://github.com/GuoqingWang1/WebFilter.git
conda create -n webfilter python=3.10
conda activate webfilter
cd webfilter
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -e .
pip3 install -r requirements.txtRunning the following command to launch the server handler:
- Modify
serper_api_keyorazure_bing_search_subscription_key&search_enginein./scrl/handler/config.yaml - Add
qwen-plusapi key in./scrl/handler/server_handler.py
client = OpenAI(
api_key="sk-xxx",
base_url="xxxx"
)- Start server handler:
python ./scrl/handler/server_handler.pyAfter launching all server handlers, you can replace server_url_list in ./scrl/handler/config.yaml in your training host node and then run:
python ./scrl/handler/handler.pyThe system prompt for our model can be found in:
./scrl/handler/config.yaml (line 25)If you’d like to train WebFilter yourself, just follow these steps:
Step 1: Choose a Reward Function
- WebFilter SR Reward
./verl/trainer/config/ppo_trainer.yaml # line 152 -> reward_manager: SR_only- WebFilter SR-RR Reward
./verl/trainer/config/ppo_trainer.yaml # line 152 -> reward_manager: SR_RRRetrieval-precision Reward (RR) mode uses a larger, more powerful pretrained LLM to evaluate both the quality of results and the retrieval process. To enable it, run:
conda create -n rrm_server --clone webfilter
conda activate rrm_server
pip install vllm==0.8.5
bash run_llm.shStep 2: Start Training Once you’ve set the reward function, launch training with:
bash train_grpo.shUsing the following command to generate rollout:
bash evaluate.shYou can find the rollout file in: ./outputs/{project_name}/{experiment_name}/rollout/rollout_step_0.json
You can rename and copy it into ./evaluate/{experiment_name}_result.json
Then, run the following command:
python ./evaluate/cacluate_metrics.py {experiment_name}You can check the score in ./evaluate/{experiment_name}_score.json
WebFilter is inspired by Deepseek-R1, with its implementation built upon veRL, Search-r1, and DeepResearcher. We sincerely thank the teams behind these projects for their valuable contributions to open-source research and development.
Grateful beyond words to the incredible contributors who made this project sparkle! ✨🙌💖
|
Yuqin Dai
🐶❤😎
|
Guoqing Wang
🏀💻🥳
|
@article{dai2025careful,
title={Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning},
author={Dai, Yuqin and Yang, Shuo and Wang, Guoqing and Deng, Yong and Zhang, Zhanwei and Yin, Jun and Zeng, Pengyu and Ying, Zhenzhe and Meng, Changhua and Yi, Can and others},
journal={arXiv preprint arXiv:2508.07956},
year={2025}
}



