Skip to content

Latest commit

 

History

History
26 lines (15 loc) · 1.81 KB

README.md

File metadata and controls

26 lines (15 loc) · 1.81 KB

DPR

If you want to reuse the dense passage retrieval (DPR) model in our paper, you should firstly follow the preprocess steps to generate the pseudo documents for all datasets. Then you may configure and apply the model to your queries and datasets.

Preprocess

For the preprocess, please:

  1. Use IlluSnip to extract top-$k$ triples for each RDF dataset, and store the results in a local MySQL database table. Note to configure the paths and database to your local settings.
  2. Run create_pseudo_document.py to create the two pseudo documents for each RDF dataset.

Use DPR

We use the implementation of Karpukhin et al., 2020. To obtain the model, you can execute the following command:

git clone https://github.com/facebookresearch/DPR.git

Then you should:

  1. Configure the paths of pseudo documents and queries to your local settings in conf/ctx_sources/ and conf/datasets/. We use ACORDAR queries in our experiments.
  2. Follow the instructions provided in the original README file to use DPR. We also provide an integrated script pipeline.sh for this process.
    • To run the script, you should configure the paths to your local settings. For example, modify $base_path/dpr/downloads/checkpoint/retriever/single-adv-hn/nq/bert-base-encoder.cp to your local path of the BERT encoder checkpoint.