Skip to content

The source code and dataset of paper "Time-sensitive Retrieval-Augmented Generation for Question Answering"

Notifications You must be signed in to change notification settings

suzhou-22/TS-Retriever

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Time-sensitive Retrieval-Augmented Generation for Question Answering

This repository contains sft models, benchmark dataset and code for evaluation for our paper.

Dataset

evaluation/data/nobel_prize: this folder contains the test benchmark dataset, including queries and corpus. train/dataset/sft/: this folder contains the train dataset to finetune the contriever.

SFT models

The following models are available:

  • Tscontriever: The finetuned model using positive and negative sample pairs with temporal constraints.
  • Tscontriever_query_only: the query-side finetuned model.
  • Router: A simple classifier is used to route the query.

The model weights can be downloaded from this Baidu Netdisk link or this Google Drive link.

Evaluation

We pre-computed and stored the embeddings of the query and the documents to be retrieved. You can reproduce the results on our benchmark using the following command.

The embeddings can be downloaded from this Baidu Netdisk link or this Google Drive link.

  • For Tscontriever result:
cd evaluation
embed_model_query="Tscontriever"
embed_model_doc="Tscontriever"
query_embed_save_dir="./temp_embed_files/query"
doc_embed_save_dir="./temp_embed_files/docs"
python experiment.py "${embed_model_query}" "${embed_model_doc}" "${query_embed_save_dir}" "${doc_embed_save_dir}/${embed_model_doc}_embed_doc"
  • For Tscontriever_query_only result:
cd evaluation
embed_model_query="Tscontriever_query_only"
embed_model_doc="Tscontriever"
query_embed_save_dir="./temp_embed_files/query"
doc_embed_save_dir="./temp_embed_files/docs"
python experiment.py "${embed_model_query}" "${embed_model_doc}" "${query_embed_save_dir}" "${doc_embed_save_dir}/${embed_model_doc}_embed_doc"
  • For Tscontriever_query_only_with_router result:
cd evaluation
embed_model_query="Tscontriever_with_router"
embed_model_doc="Tscontriever"
query_embed_save_dir="./temp_embed_files/query"
doc_embed_save_dir="./temp_embed_files/docs"
python experiment.py "${embed_model_query}" "${embed_model_doc}" "${query_embed_save_dir}" "${doc_embed_save_dir}/${embed_model_doc}_embed_doc"

Alternatively, You can also download the model weights and encode the query and documents to be retrieved to reproduce the results,Follow these steps:

  1. Download the model weights and place them in the evaluation/models folder.
  2. Navigate to the evaluation directory.
  3. Run the command: bash ./eval.sh.

Training

The training code is based on the contriever repository with slightly modified. To train the model, you can use the following command.

  • For Tscontriever:
python ./train/contriever/contriever/finetuning.py \
    --model_path <your contriever model path> \
    --eval_data ./train/dataset/sft/contriever_finetune_eval_v3.jsonl \
    --train_data ./train/dataset/sft/contriever_finetune_train_v3.jsonl \
    --save_freq 5000 \
    --eval_freq 100 \
    --random_init false \
    --total_steps 1500 \
    --negative_ctxs 1
  • For Tscontriever_query_only:
python ./train/contriever/finetuning_frozen.py \
    --model_path <your contriever model path> \
    --eval_data ./train/dataset/sft/contriever_finetune_eval_v3.jsonl \
    --train_data ./train/dataset/sft/contriever_finetune_train_v3.jsonl \
    --save_freq 5000 \
    --eval_freq 100 \
    --random_init false \
    --total_steps 1500 \
    --negative_ctxs 1

About

The source code and dataset of paper "Time-sensitive Retrieval-Augmented Generation for Question Answering"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •