This assignment's goal is to develop a a search engine from the ground up that is capable of handling tens of thousands of documents, under harsh operational constriants and having a query response time under 300ms.
Under the Algorithms and Data Structures Developer version, we created two separate programs: an indexer and a search component. Running the indexer across an entire entire collection of crawled pages, we were able to prompt the user for a query using a web GUI and respond with a list of URLs where the query appeared.
Use the package manager pip to install nltk, BeautifulSoup, pandas, and Flask.
pip install --user -U nltk
pip install beautifulsoup4
pip install pandas
pip install Flask
In order to run this program, run gui.py. In order to check if app is running, go to localhost:5000/.