-
This is the official implementation of our pre-print paper "Generate rather than Retrieve: Large Language Models are Strong Context Generators", in ICLR 2023 [OpenReview] [arXiv].
-
Create an environment and install openai package via
pip install openai
. -
Add your OpenAI API key at
openai.api_key
(line 12) ininference.py
-
From their official websites: [NQ/TriviaQA/WebQ] / [FM2] / [FEVER/Wizard]
-
From Google drive: (we unified the formats of the above datasets) [link]
-
Please put them into
indataset
folder. Now it containswebq
andfm2
.
Step1: generate background document.
python mainfunc.py
--dataset {dataset}
--task step1
--split test
-
Note: we use the
text-davinci-002
in our experiment; we use greedy search in the zero-shot setting, to ensure the reproducibility of our experiments. -
Note: if you have limited access to OpenAI API, you could directly use our outputs, without spending money on reproducing our experiments. [zero-shot: step1]
Step2: infer answer from document.
python mainfunc.py
--dataset {dataset}
--task step2
--split test
-
Trick: we remove the
\n
in the generated documents. -
Note: if you have limited access to OpenAI API, you could directly use our outputs, without spending money on reproducing our experiments. [zero-shot: step2]
Method1: use sampling to generate multiple documents.
python mainfunc.py
--dataset {dataset}
--task step1
--split test
--num_sequence 10
--temperature 0.95
- We note that when decoding with sample-based methods, the outputs may be different each time. So we cannot guarantee that your output will be exactly the same as the one we provide. [supervised: sampling]
Method2: use clustering to generate diverse documents.
python clusterfunc.py
--dataset {dataset}
--task step1
--split {split}
--num_sequence 1
--temperature 0.95
--clustering
- We note that when using different in-context demonstrations, the outputs may be different each time. So we cannot guarantee that your output will be exactly the same as the one we provide. [supervised: clustering]
Fusion-in-decoder: train a reader model to infer answer from documents
-
We use the FiD code from its official GitHub repository [link].
-
Download our trained FiD checkpoint at Huggingface Hub.
- GenRead-3B-NQ, performance on NQ test: 45.55
git lfs install git clone https://huggingface.co/wyu1/GenRead-3B-NQ
- GenRead-3B-TQA, performance on TQA test: 71.55
git lfs install git clone https://huggingface.co/wyu1/GenRead-3B-TQA
-
If you need checkpoints on other settings, please email
wyu1@nd.edu
@inproceedings{yu2023generate,
title={Generate rather than retrieve: Large language models are strong context generators},
author={Yu, Wenhao and Iter, Dan and Wang, Shuohang and Xu, Yichong and Ju, Mingxuan and Sanyal, Soumya and Zhu, Chenguang and Zeng, Michael and Jiang, Meng},
booktitle={International Conference for Learning Representation (ICLR)},
year={2023}
}
Please kindly cite our paper if you find this paper and the codes helpful.