RAG based Open-Domain Question Answering (ODQA with RAG)

This project uses a three-stage Retrieval-Augmented Generation (RAG) pipeline to effectively extract answers from large document collections. The system combines Sparse and Dense Retrieval, Reranking, and Machine Reading Comprehension (MRC) to provide accurate answers from Wikipedia. Each stage is managed in a separate branch, and this README provides an overview, directing to each branch for detailed setup and usage instructions.

Wrap-up Reports : ODQA with RAG

Overview

1. Environment Setup

To replicate the environment used for each model, refer to the environment.yml files in each branch. Set up the environment by running:

$ conda env create -f environment.yml

System requirements: Ubuntu-20.04.6 LTS

Each branch (main-Retrieval, main-Reranker, main-MRC) specifies the exact Python and PyTorch versions used for its respective model.

2. Data

Wikipedia Document Corpus: Used in the Retrieval stage, located at ./data/raw/wikipedia_documents.json, comprising approximately 57,000 unique documents.
MRC Dataset: Used in the MRC stage, stored at ./data/raw/train_dataset and ./data/raw/test_dataset.
Sample Data Structure Overview:
Sample Document Format:

3. Pipeline Stages

Retrieval Branch: main-Retrieval

Filters 100 relevant documents per query from a large Wikipedia corpus.
Reranker Branch: main-Reranker

Reranks the top 100 documents per query. The output can be analyzed for performance and is passed to the MRC stage.
MRC Branch: main-MRC

Processes the final top-k reranked documents per query, extracting exact answers from the contexts using a machine reading comprehension model.

For further information, refer to each branch’s README for installation, environment specifics, and model usage instructions.

4. Evaluation Metrics

To assess model performance, the following metrics are used:

Exact Match (EM): Awards a score only when the model’s prediction exactly matches the true answer. Each question is scored as either 0 or 1.
F1 Score: Unlike EM, F1 Score gives a partial score by considering word overlap between the prediction and the true answer. For instance, if the correct answer is "Barack Obama" but the prediction is "Obama," the EM score would be 0, while the F1 Score would award a partial score based on overlapping words.

5. Results

EM: 62.92%
F1 Score: 73.46%

These metrics provide insight into both the accuracy and partial correctness of the model’s predictions across all stages of the pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
RAG based Open-Domain Question Answering.pdf		RAG based Open-Domain Question Answering.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG based Open-Domain Question Answering (ODQA with RAG)

Overview

1. Environment Setup

2. Data

3. Pipeline Stages

4. Evaluation Metrics

5. Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

boostcampaitech7/level2-mrc-nlp-09

Folders and files

Latest commit

History

Repository files navigation

RAG based Open-Domain Question Answering (ODQA with RAG)

Overview

1. Environment Setup

2. Data

3. Pipeline Stages

4. Evaluation Metrics

5. Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages