PrefRAG PrefRAG is a novel multi-source ARAG framework, which enhances RAG by enabling in-depth and controllable exploration of diverse retrieval sources through preference-driven adaptive retrieval and self-reflection.
Install the requirements with pip: pip install -r requirements.txt.
For model inference, we recommend using vLLM to significantly speed up the inference process.
You can download our standardized datasets (including corpus, training and test sets) by running the command below. For the BioASQ-Y/N corpus, due to its large size, please download it separately from link.
bash download/raw_data.shThe data will be downloaded in the data/.
We implement two types of retrievers:
Sparse Retriever Based on BM25 algorithm implemented in Elasticsearch
- Download and install Elasticsearch server
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.10.2-linux-x86_64.tar.gz
cd elasticsearch-7.10.2/
./bin/elasticsearch # Start the server- Create document index:
# Take MusiQue dataset as an example
cd src/create_index/es
python index_musique.pyDense Retriever Based on bge-large-en-v1.5 model
- Download the bge-large-en-v1.5 model
- Create document embedding index:
# Take MusiQue dataset as an example
cd src/create_index/emb
python index.py --dataset musique- Prepare DPO training dataset:
# Generate DPO training data with specified dataset and device
python pre_dpo_data.py --output_path ../data/dpo_data --evaluator_model glm-4-plus --device 0,1,2,3
# After data generation, use process_data.ipynb to customize the proportion of different data types in the generated training set- Start training:
bash train.shpython main.py --method prefrag --retrieve_top_k 5 --dataset musique --model gpt-4o-mini-2024-07-18 --retrieve_method esThe inference process and evaluation results can be found in the output/ directory.
Here we present partial experimental results across all datasets, where BM25 is used as the retrieval method with top-k=5 documents retrieved.
| Methods & LLMs | HotpotQA | 2WikiMQA | MuSiQue | BioASQ-Y/N | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc. | F1 | EM | Avg. | Acc. | F1 | EM | Avg. | Acc. | F1 | EM | Avg. | Acc. | |
| PrefRAG_Llama3.1-8B-Instruct | 42.0 | 51.1 | 38.8 | 44.0 | 42.0 | 43.2 | 35.8 | 40.3 | 15.4 | 21.0 | 12.8 | 16.4 | 89.6 |
| PrefRAG_GLM4-9B-chat | 45.4 | 56.3 | 42.2 | 48.0 | 55.0 | 53.7 | 42.0 | 50.2 | 23.0 | 29.4 | 20.0 | 24.1 | 87.6 |
| PrefRAG_GPT-4o-mini | 58.6 | 66.0 | 50.4 | 56.6 | 76.2 | 72.1 | 59.4 | 69.2 | 28.2 | 34.3 | 21.2 | 27.9 | 92.8 |
| PrefRAG_GLM4-Plus | 59.0 | 68.4 | 55.0 | 60.8 | 79.6 | 76.7 | 65.2 | 73.8 | 32.2 | 39.4 | 27.4 | 33.0 | 94.0 |
