Designed a scalable and efficient search engine in Python to query a Wikipedia corpus of ~75GB with a response time of 1s and outputs the top 10 relevant documents based on the search query.
information-retrieval
regex
information-extraction
vector-space-model
inverted-index
data-preprocessing
tfidf
sax-parser
secondary-index
minheap
-
Updated
Jul 27, 2021 - Python