A lightweight search engine I built with Python + Flask that indexes plain-text documents and powers a fast, interactive web UI.
- TF-IDF ranking
- Boolean queries (
AND
,OR
,NOT
, parentheses) - Phrase search with positional indexes (
"exact phrase"
) - Autocomplete suggestions (prefix-based; ranked by popularity)
- Metrics: index size, build time, query latency
Drop
.txt
files into thedata/
folder and run the app. No databases required.
Start the Flask server and open:
http://localhost:5050
- Tokenization → regex-based tokenizer with lowercasing; tracks positions for phrase search.
- Inverted Index →
term -> {doc_id: [positions...]}
with term/document frequencies. - Ranking → TF-IDF scoring (
tf * log((N+1)/(df+1)) + 1
), cosine-like without normalization. - Boolean Queries → Shunting-yard parser with proper precedence & parentheses.
- Phrase Search → Intersects positional postings for multi-term exact matches.
- Autocomplete → In-memory lexicon + binary search; returns top prefixes by frequency.
- Persistence → Saves index to
index.pkl
for fast restarts (auto-regenerates if files change).
# (Optional) create a virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python app.py
inverted-index-search/
├── app.py # Flask server & HTTP API
├── search/
│ ├── indexer.py # Index building & persistence
│ ├── query.py # Query parsing, TF-IDF ranking, phrase & boolean search
│ └── utils.py # Tokenization and helpers
├── templates/
│ └── index.html # Web UI template
├── static/
│ ├── app.js # Client-side search + autocomplete
│ └── styles.css # Minimal styling
├── data/
├── requirements.txt
└── README.md
information retrieval
"machine learning"
neural AND network
(index OR search) AND NOT database
Python · Flask · Information Retrieval / NLP · HTML · CSS · JavaScript

MIT