This repository contains the Hoopla project used throughout the Boot.dev RAG course. It includes a simple keyword search CLI and a minimal inverted index.
Quick start:
- Run a search:
python -m hoopla.cli.keyword_search_cli search "your query" - Save an index with pickle (stdlib): see hoopla/README.md for a short example.
- Tokenization for improved search relevance
- Stopwords filtering to improve search quality
- Stemming using NLTK's PorterStemmer
- Enhanced partial token matching
-
Term Frequency (TF): Measures how frequently a term appears in a document
- Implemented using Counter collection for efficient token counting
- Normalizes text by removing punctuation and applying stemming
- Excludes stopwords for better relevance
-
Inverse Document Frequency (IDF): Measures how important a term is across all documents
- Calculated using the formula: log((N + 1) / (df + 1))
- N is the total number of documents
- df is the document frequency (number of documents containing the term)
- The +1 in formula provides smoothing to handle edge cases
-
TF-IDF Score: Combines TF and IDF to rank document relevance
- Higher scores indicate terms that are both frequent in a document and rare across all documents
- Used to identify distinctive terms in documents
- Search documents:
uv run cli/keyword_search_cli.py search "query" - Calculate term frequency:
uv run cli/keyword_search_cli.py tf doc_id term - Get IDF score:
uv run cli/keyword_search_cli.py idf term - Calculate TF-IDF:
uv run cli/keyword_search_cli.py tfidf doc_id term
- Minimal InvertedIndex class for efficient search
- Support for saving and loading index to disk using Python's pickle module
- Document mapping for fast document retrieval
uv is a fast Python package installer and virtual environment manager. Here's how to get started:
- Install uv:
pip install uv- Create and activate a virtual environment:
uv venv
source .venv/bin/activate # On Unix/macOS
# OR
.venv\Scripts\activate # On Windows- Install dependencies:
uv pip install -r requirements.txt- Add new dependencies:
uv pip install package_name
uv pip freeze > requirements.txt-
Dependencies Management:
- Use
requirements.txtfor package dependencies - Use
uv.lockfor dependency locking (automatically managed by uv) - Always update requirements.txt after adding new packages
- Use
-
Virtual Environment:
- Never commit .venv directory
- Always use virtual environment when developing
- One virtual environment per project
-
Package Structure:
- Keep code in the
hoopla/directory - Tests in
tests/directory - Configuration in project root
- Keep code in the
-
Code Style:
- Follow PEP 8 guidelines
- Use type hints
- Document functions and classes
- Keep functions focused and small