🔍 Mini Search Engine in Java

A console-based Mini Search Engine built using Core Java, implementing key concepts from Data Structures, Algorithms, and Information Retrieval systems.

The project supports fast keyword-based search, partial-word queries, and relevance-based ranking using TF-IDF, similar to how real-world search engines work internally.

🚀 Features

✅ Keyword-based document search
✅ Supports partially typed words (prefix search)
✅ Efficient relevance ranking using TF-IDF
✅ Returns Top 5 most relevant documents
✅ Highlights matched keywords in results
✅ Optimized for performance (no combinatorial explosion)
✅ Clean, modular, interview-ready code

🧠 How the Search Engine Works

Documents are indexed using an Inverted Index
User query is tokenized
Partial words are expanded using a Trie
- Example: ba → backend
Expanded keywords are merged into a single query
TF-IDF scores are computed for each document
Documents are ranked and Top 5 results are returned

⚡ Time Complexity Optimization (Important)

❌ Naive Approach (Not Used)

Generate all combinations of prefix-expanded words
Leads to exponential time complexity

O(k^p × D × log D)

✅ Optimized Approach (Used in This Project)

No combinatorial query generation
Single-pass TF-IDF computation using all expanded keywords
Uses a Priority Queue to keep only Top 5 results

Final Complexity:

O(T × avgDocs)

Where:

T = number of expanded query terms
avgDocs = average documents per term

➡️ This makes the engine scalable and production-like.

🧱 Data Structures Used

Data Structure	Purpose
`HashMap`	Inverted Index, document storage
`Trie`	Prefix-based word expansion
`PriorityQueue`	Ranking top search results
`Set`	Remove duplicate keywords
`List`	Store documents and results

🧮 Algorithms Used

Inverted Indexing
TF-IDF (Term Frequency – Inverse Document Frequency)
Trie-based Prefix Search
Greedy Ranking using Priority Queue
String Tokenization & Normalization

🖥️ How to Run This Project Locally

📌 Prerequisites

Java JDK 8 or above
Git
Terminal / Command Prompt / WSL

🔽 Clone the Repository

git clone https://github.com/adarsh-7-satyam/search-engine-java
cd search-engine-java

▶️ Compile the Project

javac *.java
java Main

✍️ Sample Input

jav ba fr

✅ Sample Output

Expanded keywords used for search:

java
backend
framework

Top 5 Search Results:

Spring FRAMEWORK simplifies Java BACKEND development
Spring Boot is a Java FRAMEWORK for BACKEND systems
Java BACKEND developers often use Spring FRAMEWORK ...

🎯 Learning Outcomes

Practical understanding of search engine internals
Efficient use of DSA in real-world problems
Experience with performance optimization
Clean separation of search logic and presentation

📌 Future Enhancements

Stop-word removal
Cosine similarity ranking
Spell correction
REST API using Spring Boot
Web-based UI

👨‍💻 Author

Adarsh Satyam B.Tech CSE, IIT Bhilai

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Document.java		Document.java
InvertedIndex.java		InvertedIndex.java
Main.java		Main.java
README.md		README.md
SearchEngine.java		SearchEngine.java
TFIDFCalculator.java		TFIDFCalculator.java
Tokenizer.java		Tokenizer.java
Trie.java		Trie.java
TrieNode.java		TrieNode.java
Utils.java		Utils.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Mini Search Engine in Java

🚀 Features

🧠 How the Search Engine Works

⚡ Time Complexity Optimization (Important)

❌ Naive Approach (Not Used)

✅ Optimized Approach (Used in This Project)

🧱 Data Structures Used

🧮 Algorithms Used

🖥️ How to Run This Project Locally

📌 Prerequisites

🔽 Clone the Repository

▶️ Compile the Project

✍️ Sample Input

✅ Sample Output

🎯 Learning Outcomes

📌 Future Enhancements

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

adarsh-7-satyam/search-engine-java

Folders and files

Latest commit

History

Repository files navigation

🔍 Mini Search Engine in Java

🚀 Features

🧠 How the Search Engine Works

⚡ Time Complexity Optimization (Important)

❌ Naive Approach (Not Used)

✅ Optimized Approach (Used in This Project)

🧱 Data Structures Used

🧮 Algorithms Used

🖥️ How to Run This Project Locally

📌 Prerequisites

🔽 Clone the Repository

▶️ Compile the Project

✍️ Sample Input

✅ Sample Output

🎯 Learning Outcomes

📌 Future Enhancements

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages