Skip to content

This repository contains setup and usage for RAG that queries text from articles published on arxiv.org

Notifications You must be signed in to change notification settings

GlockPL/ArxivRAG

Repository files navigation

RAG App for Arxiv Articles from cs_AI category

Setup

Create .env file with api key to google ai studio.
Simple template is in file env_template
Key can be generated for free here.
The amount of data is so big that free tier maybe not enough to index entire dataset.

Run docker compose file:

Clone the repository.
Create folder backups inside the main folder of the repository and run:

docker-compose -f docker-compose-prod.yml up --build -d

This will build the docker image with the repo and launch all the databases necessary to run the project.

Restoring weaviate db from backup:

You can download the prepared database from here
After download unpack the data into the backups directory and run this command from terminal:

curl -X POST -H "Content-Type: application/json" -d '{"id": "arxiv-backup-v_1_0"}' http://localhost:8080/v1/backups/filesystem/arxiv-backup-v_1_0/restore

This will load the content of the backup into the database.
Connet through browser with http://localhost and register new user.

Indexing data

Run docker compose:

docker compose up -d

This will start weaviate database. Than on python 3.11 run:

 poetry install

next

poetry run ./rag/indexing.py

Creating backup for weaviate:

curl -X POST -H "Content-Type: application/json" -d '{"id": "arxiv-backup-v_1_0"}' http://localhost:8080/v1/backups/filesystem

About

This repository contains setup and usage for RAG that queries text from articles published on arxiv.org

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •