Skip to content

Local inverted-index search engine with a Flask UI—TF-IDF ranking, boolean/phrase queries, and autocomplete. Drop in .txt files and search.

Notifications You must be signed in to change notification settings

leahchaku/inverted-index-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inverted Index Search (Advanced)

A lightweight search engine I built with Python + Flask that indexes plain-text documents and powers a fast, interactive web UI.

  • TF-IDF ranking
  • Boolean queries (AND, OR, NOT, parentheses)
  • Phrase search with positional indexes ("exact phrase")
  • Autocomplete suggestions (prefix-based; ranked by popularity)
  • Metrics: index size, build time, query latency

Drop .txt files into the data/ folder and run the app. No databases required.

🚀 Demo

Start the Flask server and open:
http://localhost:5050

✨ Features

  • Tokenization → regex-based tokenizer with lowercasing; tracks positions for phrase search.
  • Inverted Indexterm -> {doc_id: [positions...]} with term/document frequencies.
  • Ranking → TF-IDF scoring (tf * log((N+1)/(df+1)) + 1), cosine-like without normalization.
  • Boolean Queries → Shunting-yard parser with proper precedence & parentheses.
  • Phrase Search → Intersects positional postings for multi-term exact matches.
  • Autocomplete → In-memory lexicon + binary search; returns top prefixes by frequency.
  • Persistence → Saves index to index.pkl for fast restarts (auto-regenerates if files change).

⚡ Quick Start

# (Optional) create a virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py

📂 Project Structure

inverted-index-search/
├── app.py                  # Flask server & HTTP API
├── search/
│   ├── indexer.py          # Index building & persistence
│   ├── query.py            # Query parsing, TF-IDF ranking, phrase & boolean search
│   └── utils.py            # Tokenization and helpers
├── templates/
│   └── index.html          # Web UI template
├── static/
│   ├── app.js              # Client-side search + autocomplete
│   └── styles.css          # Minimal styling
├── data/                
├── requirements.txt
└── README.md

🔍 Example Queries

information retrieval

"machine learning"

neural AND network

(index OR search) AND NOT database

🛠 Tech Stack

Python · Flask · Information Retrieval / NLP · HTML · CSS · JavaScript

📸 Preview

Screenshot 2025-08-23 at 2 03 47 PM

📜 License

MIT

About

Local inverted-index search engine with a Flask UI—TF-IDF ranking, boolean/phrase queries, and autocomplete. Drop in .txt files and search.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published