This is the final project for CIS 555 - Internet & Web Systems, a course offered by the University of Pennsylvania in the Spring of 2015. The group members of this project include Kelsey Duncombe-Smith (Penn '15), Mark Harding (Penn '15), Alex Harelick (Penn '16), and Corey Loman (Penn '16).
The project was to create a search engine, which includes a web crawler, an indexer, and a user interface. This particular repository includes the code powering the indexer and the code behind the first iteration of our web crawler. The links to the other repositories are listed below, along with a list of helpful resources pertaining to this project. Worth mentioning, however, is that the MapReduce code for PageRank is not public. The rest of the code surrounding this project, however, is. Let us know if you have any questions.
Helpful Resources:
- User Interface Repository: https://github.com/kelseyds/cis555-project-ui
- Web Crawler Version 2.0 Repository: https://github.com/aharelick/cis555-project-v2
- Project Specifications: http://www.cis.upenn.edu/~cis455/assignments/Final%20Project.pdf
- Final Report: https://docs.google.com/document/d/1-G_0nbEywJSXMK4zUStiZ08QLN8IVi1z_Km_M9R7km8/edit?usp=sharing