Conversational Passage Retrieval

This is a repo focused on conversational passage retrieval for course DAT640 at the University of Stavanger. The report for this project can be found Conversational Passage Retrieval.pdf.

Results

All the results are in the res folder.

You are tasked with the role of " Query Rewriter " for a
Conversational Passage Retrieval system . In conversational
queries , subsequent questions may lack essential details
present in prior interactions . Your goal is to integrate
context from previous queries to rewrite the current query
into a more detailed and standalone search query . This will
ensure that the rewritten query is optimized for retrieving
the most relevant passage , even without the conversational
context .
Example :
Conversational Queries :
- Tell me about the International Linguistics Olympiad .
- How do I prepare for it ?
- How tough is the exam ?
Rewritten Queries :
- International Linguistics Olympiad overview .
- Preparation methods for the International Linguistics Olympiad .
- Difficulty level of the International Linguistics Olympiad exam .

In both cases you will have to copy the content of every set of query for each topic in the chat window and copy the output in a new file. GPT-4 can handle all the queries at once, but we did not tried with GPT-3.5. The file with the rewritten queries is included in the repository queries_test_qr.csv and queries_train_qr.csv and can be used directly.

SPLADE

Clone the SPLADE repository.
Create the SPLADE environment according to the instructions.
Move our collection to data/msmarco/full_collection/raw.tsv in the SPLADE repository.
Move our configuration file config_splade++_cocondenser_ensembledistil_OURS.yaml to conf/config_splade++_cocondenser_ensembledistil_OURS.yaml
Prepare the environment:

conda activate splade_env
export PYTHONPATH=$PYTHONPATH:$(pwd)
export SPLADE_CONFIG_NAME="config_splade++_cocondenser_ensembledistil_OURS"

Run indexing:

python3 -m splade.index \
  init_dict.model_type_or_dir=naver/splade-cocondenser-ensembledistil \
  config.pretrained_no_yamlconfig=true \
  config.index_dir=experiments/pre-trained/index

Move queries in the tsv format to data/msmarco/dev_queries/raw.tsv (use convert_csv_to_tsv.py in case of csv file).
Run ranking:

python3 -m splade.retrieve \
  init_dict.model_type_or_dir=naver/splade-cocondenser-ensembledistil \
  config.pretrained_no_yamlconfig=true \
  config.index_dir=experiments/pre-trained/index \
  config.out_dir=experiments/pre-trained/out

Evaluation

For evaluation we are using the TREC eval tool.

git clone https://github.com/usnistgov/trec_eval.git
cd trec_eval
make
./trec_eval -c -m recall.1000 -m map -m recip_rank -m ndcg_cut.3 -l2 -M1000 qrels_train.txt {YOUR_TREC_RUNFILE}

Pre run files

In order not to run the whole pipeline, we have included all the results form the different methods in the folder res. Including the scores as a csv file scores.csv and the jupiter notebook to generate the plots plots.ipynb to visualize the results.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
config		config
data		data
notebooks		notebooks
res		res
src		src
.gitignore		.gitignore
Consersational Passage Retrival.pdf		Consersational Passage Retrival.pdf
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Passage Retrieval

Results

Table of Contents

Downloading collection

Environment setup

Code execution

Running the baseline

Re-ranking

Main runner

Parser Argument Documentation

Arguments

`--pre_process`

`--model`

`--type`

`--qr`

`--dataset`

`--output`

`--ranking`

`--res`

Question rewriting using Chat-GPT4

SPLADE

Evaluation

Pre run files

About

Releases

Packages

Contributors 3

Languages

Pappol/conversational_passage_retrieval

Folders and files

Latest commit

History

Repository files navigation

Conversational Passage Retrieval

Results

Table of Contents

Downloading collection

Environment setup

Code execution

Running the baseline

Re-ranking

Main runner

Parser Argument Documentation

Arguments

--pre_process

--model

--type

--qr

--dataset

--output

--ranking

--res

Question rewriting using Chat-GPT4

SPLADE

Evaluation

Pre run files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`--pre_process`

`--model`

`--type`

`--qr`

`--dataset`

`--output`

`--ranking`

`--res`

Packages