Skip to content

dsomni/pyfinder-ir-s25

Repository files navigation

PyFinder

Python Ruff pre-commit

An Information Retrieval S25 Project


📚 Table of Contents


📌 Contributors


💼 Requirements

  • ✅ Tested on Windows 11 and Fedora Linux
  • 🐍 Requires Python 3.12
  • 📦 All dependencies are listed in pyproject.toml

🚀 Before You Start

Install all dependencies using uv:

uv sync

Create a .env file based on .env.example:

cp .env.example .env

Enable pre-commit hooks for auto-formatting/linting:

uv run pre-commit install
uv run pre-commit run --all-files

Important

📄 We highly recommend reading about.md to understand the workflow


⚡ Quick Start

🛠️ Setup

Important

Make sure that you have installed all the python dependencies (check 🚀 Before You Start for details)

Setup project using script:

  • Windows:

    ./setup.bat
  • Linux:

    bash ./setup.sh

Note

Do not worry: sometimes script takes a while to initialize

Warning

When setting up using script, you can not pass any flags. For flag description run: uv run ./src/setup.py -h

Or run corresponding python script:

uv run ./src/setup.py

🏗️ Production

Start everything together:

  • Windows:

    ./run_prod.bat
  • Linux:

    bash ./run_prod.sh

Or start frontend/backend separately:

  • Backend:

    uv run fastapi run
  • Frontend:

    cd frontend
    yarn build
    yarn start

🧪 Development

Start everything together:

  • Windows:

    ./run_dev.bat
  • Linux:

    bash ./run_dev.sh

Or start frontend/backend separately:

  • Backend:

    uv run fastapi dev
  • Frontend:

    cd frontend
    yarn dev

🗂️ Repository Structure

.
├── data/                          # Data used in project
│   ├── bad_words/
│   │   └── bad_words.txt          # List of inappropriate words
│   │
│   ├── evaluation/                # Evaluation results and metrics
│   │   ├── general_metrics.json
│   │   ├── indexer_responses.json
│   │   ├── llm_metrics.json
│   │   ├── llm_responses.json
│   │   └── queries.json
│   │
│   ├── index_directory/          # Indexes and related metadata
│   │   ├── document_lengths.json
│   │   ├── document_word_count.json
│   │   ├── documents.json
│   │   └── index.json
│   │
│   ├── llm_tree_index/           # LLM-related tree index
│   │   ├── builder.json
│   │   └── tree.pkl
│   │
│   ├── scrapped/                 # Raw scraped web data
│   │   └── index_1_1.json        # Information about scrapped data
│   │
│   └── spell_directory/          # Spellcheck-related files
│       ├── counter.json
│       ├── settings.json
│       └── .gitignore
│
├── frontend/                  # Next.js frontend application
│
├── pictures/                  # Images, graphs, plots
│
├── src/                       # Main source code
│   ├── notebooks/             # Jupyter notebooks
│   │   ├── bert_indexer.ipynb
│   │   ├── content_filter.ipynb
│   │   ├── evaluate.ipynb     # For metrics evaluation
│   │   ├── inverted_index.ipynb
│   │   ├── scrapper.ipynb
│   │   ├── spellcheck.ipynb
│   │   └── w2v_indexer.ipynb
│   │
│   ├── bloom.py                # Bad words filter
│   ├── inverted_index.py
│   ├── llm_indexer.py
│   ├── pipeline.py             # Complete pipelines
│   ├── rag_local.py            # RAG with local models
│   ├── rag.py                  # RAG with API
│   ├── scrapper.py             # Data scrapper
│   ├── setup.py                # Main setup file
│   ├── spellcheck.py           # Norvig spell checker
│   ├── utils.py
│   └── w2v_indexer.py          # Unsuccessful Word2Vec
│
├── .env                       # Environment variables
├── .env.example               # Example environment template
├── .gitignore
├── .pre-commit-config.yaml    # Pre-commit hooks config
├── .python-version
├── about.md                   # Project description
├── main.py                    # FastAPI backend application
├── presentation.pdf           # Project presentation
├── pyproject.toml             # Dependency and tool config
├── uv.lock
└── README.md                  # Project documentation (this file)

📬 Contact

If you have any questions, feel free to reach out via the university emails listed above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •