An Information Retrieval S25 Project
- Dmitry Beresnev — d.beresnev@innopolis.university
- Vsevolod Klyushev — v.klyushev@innopolis.university
- Nikita Yaneev — n.yaneev@innopolis.university
- ✅ Tested on Windows 11 and Fedora Linux
- 🐍 Requires Python 3.12
- 📦 All dependencies are listed in
pyproject.toml
Install all dependencies using uv:
uv syncCreate a .env file based on .env.example:
cp .env.example .envEnable pre-commit hooks for auto-formatting/linting:
uv run pre-commit install
uv run pre-commit run --all-filesImportant
📄 We highly recommend reading about.md to understand the workflow
Important
Make sure that you have installed all the python dependencies (check 🚀 Before You Start for details)
Setup project using script:
-
Windows:
./setup.bat
-
Linux:
bash ./setup.sh
Note
Do not worry: sometimes script takes a while to initialize
Warning
When setting up using script, you can not pass any flags. For flag description run: uv run ./src/setup.py -h
Or run corresponding python script:
uv run ./src/setup.pyStart everything together:
-
Windows:
./run_prod.bat
-
Linux:
bash ./run_prod.sh
Or start frontend/backend separately:
-
Backend:
uv run fastapi run
-
Frontend:
cd frontend yarn build yarn start
Start everything together:
-
Windows:
./run_dev.bat
-
Linux:
bash ./run_dev.sh
Or start frontend/backend separately:
-
Backend:
uv run fastapi dev
-
Frontend:
cd frontend yarn dev
.
├── data/ # Data used in project
│ ├── bad_words/
│ │ └── bad_words.txt # List of inappropriate words
│ │
│ ├── evaluation/ # Evaluation results and metrics
│ │ ├── general_metrics.json
│ │ ├── indexer_responses.json
│ │ ├── llm_metrics.json
│ │ ├── llm_responses.json
│ │ └── queries.json
│ │
│ ├── index_directory/ # Indexes and related metadata
│ │ ├── document_lengths.json
│ │ ├── document_word_count.json
│ │ ├── documents.json
│ │ └── index.json
│ │
│ ├── llm_tree_index/ # LLM-related tree index
│ │ ├── builder.json
│ │ └── tree.pkl
│ │
│ ├── scrapped/ # Raw scraped web data
│ │ └── index_1_1.json # Information about scrapped data
│ │
│ └── spell_directory/ # Spellcheck-related files
│ ├── counter.json
│ ├── settings.json
│ └── .gitignore
│
├── frontend/ # Next.js frontend application
│
├── pictures/ # Images, graphs, plots
│
├── src/ # Main source code
│ ├── notebooks/ # Jupyter notebooks
│ │ ├── bert_indexer.ipynb
│ │ ├── content_filter.ipynb
│ │ ├── evaluate.ipynb # For metrics evaluation
│ │ ├── inverted_index.ipynb
│ │ ├── scrapper.ipynb
│ │ ├── spellcheck.ipynb
│ │ └── w2v_indexer.ipynb
│ │
│ ├── bloom.py # Bad words filter
│ ├── inverted_index.py
│ ├── llm_indexer.py
│ ├── pipeline.py # Complete pipelines
│ ├── rag_local.py # RAG with local models
│ ├── rag.py # RAG with API
│ ├── scrapper.py # Data scrapper
│ ├── setup.py # Main setup file
│ ├── spellcheck.py # Norvig spell checker
│ ├── utils.py
│ └── w2v_indexer.py # Unsuccessful Word2Vec
│
├── .env # Environment variables
├── .env.example # Example environment template
├── .gitignore
├── .pre-commit-config.yaml # Pre-commit hooks config
├── .python-version
├── about.md # Project description
├── main.py # FastAPI backend application
├── presentation.pdf # Project presentation
├── pyproject.toml # Dependency and tool config
├── uv.lock
└── README.md # Project documentation (this file)
If you have any questions, feel free to reach out via the university emails listed above.