If you want to reuse the dense passage retrieval (DPR) model in our paper, you should firstly follow the preprocess steps to generate the pseudo documents for all datasets. Then you may configure and apply the model to your queries and datasets.
For the preprocess, please:
- Use IlluSnip to extract top-$k$ triples for each RDF dataset, and store the results in a local MySQL database table. Note to configure the paths and database to your local settings.
- Run create_pseudo_document.py to create the two pseudo documents for each RDF dataset.
We use the implementation of Karpukhin et al., 2020. To obtain the model, you can execute the following command:
git clone https://github.com/facebookresearch/DPR.git
Then you should:
- Configure the paths of pseudo documents and queries to your local settings in conf/ctx_sources/ and conf/datasets/. We use ACORDAR queries in our experiments.
- Follow the instructions provided in the original README file to use DPR. We also provide an integrated script pipeline.sh for this process.
- To run the script, you should configure the paths to your local settings. For example, modify
$base_path/dpr/downloads/checkpoint/retriever/single-adv-hn/nq/bert-base-encoder.cp
to your local path of the BERT encoder checkpoint.
- To run the script, you should configure the paths to your local settings. For example, modify