Designed a scalable and efficient search engine in Python to query a Wikipedia corpus of ~75GB with a response time of 1s and outputs the top 10 relevant documents based on the search query.
information-retrieval regex information-extraction vector-space-model inverted-index data-preprocessing tfidf sax-parser secondary-index minheap
-
Updated
Jul 27, 2021 - Python