A page rank algorithm implementation in python
The files are build on Python version 2.7x
In windows: Open the python editor IDLE from the menu, and open xxxx.py(eg: Task-1.py for Task-1), then press F5 to run it.
In Linux:
- On terminal, type chmod u+rx xxxx.py(eg: Task1_HtmlTextExtracter.py for Task1_HtmlTextExtracter) to make file executable and press Enter
- Type python xxxx.py(eg: Task-1.py for Task-1) and press Enter
- Provide the corpus file in directory "Corpus\Task1\..."
-
G1.txt - outlink graph for wiki urls(top 1000) starting from wiki/Sustainable_energy
-
Task_2_G1_Perplexity.txt - Perplexities after running page rank on G1
-
Task_2_G2_Perplexity.txt - Perplexities after running page rank on G2
-
Task_2_G1_Top50.txt - Top 50 pages from G1
-
Task_2_G2_Top50.txt - Top 50 pages from G2
-
Task1_Report.txt - Summary of G1 and G2