R1-Router: Learning to Route Queries across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Chunyi Peng¹, Zhipeng Xu¹, Zhenghao Liu¹, Yishan Li³, Yukun Yan², Zhiyuan Liu², Yu Gu¹ Minghe Yu¹ Ge Yu¹ Maosong Sun²

¹Northeastern University, ²Tsinghua University, ³ModleBest Inc.

If you find this project useful, please give us a star🌟.

Environment

For training, answer generation, and evaluation processes:

conda create -n router python=3.11
conda activate router
pip install requirements_router.txt

For retriever and corpus construction processes:

conda create -n retriever python=3.11
conda activate retriever
pip install requirements_retriever.txt

Corpora Construction

For the text corpus, you can download enwiki-20241020 from Huggingface. Then preprocess, and index it with the following commands:

7z x enwiki-20241020-pages-articles-multistream.xml.zip.001 
conda activate retriever
wikiextractor enwiki-20241020-pages-articles-multistream.xml.bz2 -o wiki_extracted
python wiki_preprocess.py

For the image corpus, you can directly download M-BEIR. To embed and index it you can follow the repository

For the table corpus, you can download, embed and index Open-WikiTable following the repository, or you can download directly the one we have already preprocessed from here.

Retrievers Preparation

For the Text-Image Retriever, you can directly download UniIR

For the Table Retriever, you can train it with the help of repository, or you can download it directly from here.

Datasets

We have prepared all the text datasets in ./datasets, for images you need to download them from:

InfoSeek: InfoSeek images can be downloaded from OVEN
Dyn-VQA: Dynamic VQA images can be downloaded from DynVQA_en.202412
WebQA: WebQA images can be downloaded from Google Drive

Training

If you do not want to train the model, you can download R1-Router and skip this section to Evaluation

Data Synthesis

If you want to use the ready-to-use synthetic data directly, you can skip this section to Step-GRPO Training

First, we need to synthesis the data step by step:

bash src/data_synthesis/data_synthesis.sh

Step-GRPO Training

Our training framework is based on EasyR1, only you need to do is to download it and replace some files with the files in ./Easy-R1. Then start training with the command:

conda activate router
bash examples/run_qwen2_5_vl_7b_stepgrpo.sh

Evaluation

We provide the evaluation pipeline for the R1-Router:

bash evaluation.sh

or, you can just evaluate the results we have provided by:

conda activate router
cd src
python evaluate.py --dataset_name all --method "r1-router3"

Acknowledgement

Our work is built on the following codebases, and we are deeply grateful for their contributions.

Citation

We appreciate your citations if you find our paper related and useful to your research!

@article{peng2025r1,
  title={Learning to Route Queries across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning},
  author={Peng, Chunyi and Xu, Zhipeng and Liu, Zhenghao and Li, Yishan and Yan, Yukun and Wang, Shuo and Liu, Zhiyuan and Gu, Yu and Yu, Minghe and Yu, Ge and Sun, Maosong},
  year={2025}
  url={https://arxiv.org/abs/2505.22095}, 
}

Contact Us

If you have questions, suggestions, and bug reports, please email us, we will try our best to help you.

hm.cypeng@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Easy-R1		Easy-R1
datasets		datasets
src		src
.gitignore		.gitignore
README.md		README.md
requirements_retriever.txt		requirements_retriever.txt
requirements_router.text		requirements_router.text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

R1-Router: Learning to Route Queries across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Chunyi Peng¹, Zhipeng Xu¹, Zhenghao Liu¹, Yishan Li³, Yukun Yan², Zhiyuan Liu², Yu Gu¹ Minghe Yu¹ Ge Yu¹ Maosong Sun²

¹Northeastern University, ²Tsinghua University, ³ModleBest Inc.

If you find this project useful, please give us a star🌟.

Environment

Corpora Construction

Retrievers Preparation

Datasets

Training

Data Synthesis

Step-GRPO Training

Evaluation

Acknowledgement

Citation

Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

OpenBMB/R1-Router

Folders and files

Latest commit

History

Repository files navigation

R1-Router: Learning to Route Queries across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Chunyi Peng1, Zhipeng Xu1, Zhenghao Liu1, Yishan Li3, Yukun Yan2, Zhiyuan Liu2, Yu Gu1 Minghe Yu1 Ge Yu1 Maosong Sun2 1Northeastern University, 2Tsinghua University, 3ModleBest Inc.

If you find this project useful, please give us a star🌟.

Environment

Corpora Construction

Retrievers Preparation

Datasets

Training

Data Synthesis

Step-GRPO Training

Evaluation

Acknowledgement

Citation

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Chunyi Peng¹, Zhipeng Xu¹, Zhenghao Liu¹, Yishan Li³, Yukun Yan², Zhiyuan Liu², Yu Gu¹ Minghe Yu¹ Ge Yu¹ Maosong Sun²

¹Northeastern University, ²Tsinghua University, ³ModleBest Inc.

Packages