A console-based Mini Search Engine built using Core Java, implementing key concepts from Data Structures, Algorithms, and Information Retrieval systems.
The project supports fast keyword-based search, partial-word queries, and relevance-based ranking using TF-IDF, similar to how real-world search engines work internally.
- ✅ Keyword-based document search
- ✅ Supports partially typed words (prefix search)
- ✅ Efficient relevance ranking using TF-IDF
- ✅ Returns Top 5 most relevant documents
- ✅ Highlights matched keywords in results
- ✅ Optimized for performance (no combinatorial explosion)
- ✅ Clean, modular, interview-ready code
- Documents are indexed using an Inverted Index
- User query is tokenized
- Partial words are expanded using a Trie
- Example:
ba → backend
- Example:
- Expanded keywords are merged into a single query
- TF-IDF scores are computed for each document
- Documents are ranked and Top 5 results are returned
- Generate all combinations of prefix-expanded words
- Leads to exponential time complexity
O(k^p × D × log D)
- No combinatorial query generation
- Single-pass TF-IDF computation using all expanded keywords
- Uses a Priority Queue to keep only Top 5 results
Final Complexity:
O(T × avgDocs)
Where:
T= number of expanded query termsavgDocs= average documents per term
➡️ This makes the engine scalable and production-like.
| Data Structure | Purpose |
|---|---|
HashMap |
Inverted Index, document storage |
Trie |
Prefix-based word expansion |
PriorityQueue |
Ranking top search results |
Set |
Remove duplicate keywords |
List |
Store documents and results |
- Inverted Indexing
- TF-IDF (Term Frequency – Inverse Document Frequency)
- Trie-based Prefix Search
- Greedy Ranking using Priority Queue
- String Tokenization & Normalization
- Java JDK 8 or above
- Git
- Terminal / Command Prompt / WSL
git clone https://github.com/adarsh-7-satyam/search-engine-java
cd search-engine-javajavac *.java
java Mainjav ba fr
Expanded keywords used for search:
- java
- backend
- framework
Top 5 Search Results:
- Spring FRAMEWORK simplifies Java BACKEND development
- Spring Boot is a Java FRAMEWORK for BACKEND systems
- Java BACKEND developers often use Spring FRAMEWORK ...
-
Practical understanding of search engine internals
-
Efficient use of DSA in real-world problems
-
Experience with performance optimization
-
Clean separation of search logic and presentation
-
Stop-word removal
-
Cosine similarity ranking
-
Spell correction
-
REST API using Spring Boot
-
Web-based UI
Adarsh Satyam B.Tech CSE, IIT Bhilai