Create .env file with api key to google ai studio.
Simple template is in file env_template
Key can be generated for free here.
The amount of data is so big that free tier maybe not enough to index entire dataset.
Clone the repository.
Create folder backups inside the main folder of the repository and run:
docker-compose -f docker-compose-prod.yml up --build -dThis will build the docker image with the repo and launch all the databases necessary to run the project.
You can download the prepared database from here
After download unpack the data into the backups directory and run this command from terminal:
curl -X POST -H "Content-Type: application/json" -d '{"id": "arxiv-backup-v_1_0"}' http://localhost:8080/v1/backups/filesystem/arxiv-backup-v_1_0/restoreThis will load the content of the backup into the database.
Connet through browser with http://localhost and register new user.
Run docker compose:
docker compose up -d
This will start weaviate database. Than on python 3.11 run:
poetry install
next
poetry run ./rag/indexing.py
curl -X POST -H "Content-Type: application/json" -d '{"id": "arxiv-backup-v_1_0"}' http://localhost:8080/v1/backups/filesystem