PyFinder

An Information Retrieval S25 Project

📚 Table of Contents

📌 Contributors
💼 Requirements
🚀 Before You Start
⚡ Quick Start
🗂️ Repository Structure
📬 Contact

📌 Contributors

Dmitry Beresnev — d.beresnev@innopolis.university
Vsevolod Klyushev — v.klyushev@innopolis.university
Nikita Yaneev — n.yaneev@innopolis.university

💼 Requirements

✅ Tested on Windows 11 and Fedora Linux
🐍 Requires Python 3.12
📦 All dependencies are listed in pyproject.toml

🚀 Before You Start

Install all dependencies using uv:

uv sync

Create a .env file based on .env.example:

cp .env.example .env

Enable pre-commit hooks for auto-formatting/linting:

uv run pre-commit install
uv run pre-commit run --all-files

Important

📄 We highly recommend reading about.md to understand the workflow

⚡ Quick Start

🛠️ Setup

Important

Make sure that you have installed all the python dependencies (check 🚀 Before You Start for details)

Setup project using script:

Windows:
```
./setup.bat
```
Linux:
```
bash ./setup.sh
```

Note

Do not worry: sometimes script takes a while to initialize

Warning

When setting up using script, you can not pass any flags. For flag description run: uv run ./src/setup.py -h

Or run corresponding python script:

uv run ./src/setup.py

🏗️ Production

Start everything together:

Windows:
```
./run_prod.bat
```
Linux:
```
bash ./run_prod.sh
```

Or start frontend/backend separately:

Backend:
```
uv run fastapi run
```
Frontend:
```
cd frontend
yarn build
yarn start
```

🧪 Development

Start everything together:

Windows:
```
./run_dev.bat
```
Linux:
```
bash ./run_dev.sh
```

Or start frontend/backend separately:

Backend:
```
uv run fastapi dev
```
Frontend:
```
cd frontend
yarn dev
```

🗂️ Repository Structure

.
├── data/                          # Data used in project
│   ├── bad_words/
│   │   └── bad_words.txt          # List of inappropriate words
│   │
│   ├── evaluation/                # Evaluation results and metrics
│   │   ├── general_metrics.json
│   │   ├── indexer_responses.json
│   │   ├── llm_metrics.json
│   │   ├── llm_responses.json
│   │   └── queries.json
│   │
│   ├── index_directory/          # Indexes and related metadata
│   │   ├── document_lengths.json
│   │   ├── document_word_count.json
│   │   ├── documents.json
│   │   └── index.json
│   │
│   ├── llm_tree_index/           # LLM-related tree index
│   │   ├── builder.json
│   │   └── tree.pkl
│   │
│   ├── scrapped/                 # Raw scraped web data
│   │   └── index_1_1.json        # Information about scrapped data
│   │
│   └── spell_directory/          # Spellcheck-related files
│       ├── counter.json
│       ├── settings.json
│       └── .gitignore
│
├── frontend/                  # Next.js frontend application
│
├── pictures/                  # Images, graphs, plots
│
├── src/                       # Main source code
│   ├── notebooks/             # Jupyter notebooks
│   │   ├── bert_indexer.ipynb
│   │   ├── content_filter.ipynb
│   │   ├── evaluate.ipynb     # For metrics evaluation
│   │   ├── inverted_index.ipynb
│   │   ├── scrapper.ipynb
│   │   ├── spellcheck.ipynb
│   │   └── w2v_indexer.ipynb
│   │
│   ├── bloom.py                # Bad words filter
│   ├── inverted_index.py
│   ├── llm_indexer.py
│   ├── pipeline.py             # Complete pipelines
│   ├── rag_local.py            # RAG with local models
│   ├── rag.py                  # RAG with API
│   ├── scrapper.py             # Data scrapper
│   ├── setup.py                # Main setup file
│   ├── spellcheck.py           # Norvig spell checker
│   ├── utils.py
│   └── w2v_indexer.py          # Unsuccessful Word2Vec
│
├── .env                       # Environment variables
├── .env.example               # Example environment template
├── .gitignore
├── .pre-commit-config.yaml    # Pre-commit hooks config
├── .python-version
├── about.md                   # Project description
├── main.py                    # FastAPI backend application
├── presentation.pdf           # Project presentation
├── pyproject.toml             # Dependency and tool config
├── uv.lock
└── README.md                  # Project documentation (this file)

📬 Contact

If you have any questions, feel free to reach out via the university emails listed above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyFinder

📚 Table of Contents

📌 Contributors

💼 Requirements

🚀 Before You Start

⚡ Quick Start

🛠️ Setup

🏗️ Production

🧪 Development

🗂️ Repository Structure

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
frontend		frontend
pictures		pictures
src		src
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
about.md		about.md
main.py		main.py
presentation.pdf		presentation.pdf
pyproject.toml		pyproject.toml
run_dev.bat		run_dev.bat
run_dev.sh		run_dev.sh
run_prod.bat		run_prod.bat
run_prod.sh		run_prod.sh
setup.bat		setup.bat
setup.sh		setup.sh
uv.lock		uv.lock

dsomni/pyfinder-ir-s25

Folders and files

Latest commit

History

Repository files navigation

PyFinder

📚 Table of Contents

📌 Contributors

💼 Requirements

🚀 Before You Start

⚡ Quick Start

🛠️ Setup

🏗️ Production

🧪 Development

🗂️ Repository Structure

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages