conda create --name venv python=3.10 -y
conda activate venv
pip install -r requirements.txt
- Update
LLAMAINDEX_API_KEYin .env file - Run:
python3 prepare_data.py </path/to/pdf> - Cited research papers are fetched from the input pdf and stored in pdf_papers
- PDF's are parfed into .txt format using llamaIndex efficiently and stored in txt_papers papers
- Implemented BM25 and BERT based retreival methods in retreive_docs.ipynb file
- We Qualitatively found ColBERT results to be better.
- Update
PAPER_NAME = *.txtin .env file - Run
sudo apt install jupyter-nbconvert jupyter nbconvert --execute --inplace colbert.ipynb - Retrives the most relevant papers from txt_papers using ColBERT model.
- Stores most relevant papers in final_input directory in .txt format
- final_input dir is used for running on Microsoft Graph RAG.
- cp final_input/*
/path/to/input /path/to/inputdirectory refers to input directory ofMS KG-RAGimplementation- Run
KG-RAG, CLI app with QnA session
- Sample Responses generated by the Knowledge Graphs Terminal and ResearchNexus Terminal are also attached.
- answers_knowledgegraphs.txt contains answers to queries generated by Knowledge Graphs.
- answers_researchnexus.txt contains answers to queries generated by ResearchNexus (our proposed pipeline).