python-page-rank-implementation

A page rank algorithm implementation in python

The files are build on Python version 2.7x

How to run the files

In windows: Open the python editor IDLE from the menu, and open xxxx.py(eg: Task-1.py for Task-1), then press F5 to run it.

In Linux:

On terminal, type chmod u+rx xxxx.py(eg: Task1_HtmlTextExtracter.py for Task1_HtmlTextExtracter) to make file executable and press Enter
Type python xxxx.py(eg: Task-1.py for Task-1) and press Enter
Provide the corpus file in directory "Corpus\Task1\..."

G1.txt - outlink graph for wiki urls(top 1000) starting from wiki/Sustainable_energy
Task_2_G1_Perplexity.txt - Perplexities after running page rank on G1
Task_2_G2_Perplexity.txt - Perplexities after running page rank on G2
Task_2_G1_Top50.txt - Top 50 pages from G1
Task_2_G2_Top50.txt - Top 50 pages from G2
Task1_Report.txt - Summary of G1 and G2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Corpus		Corpus
.DS_Store		.DS_Store
G1.txt		G1.txt
G2.txt		G2.txt
PageRank_Final.py		PageRank_Final.py
README.md		README.md
Task_1_Report.txt		Task_1_Report.txt
Task_2_G1_Perplexity.txt		Task_2_G1_Perplexity.txt
Task_2_G1_Top50.txt		Task_2_G1_Top50.txt
Task_2_G2_Perplexity.txt		Task_2_G2_Perplexity.txt
Task_2_G2_Top50.txt		Task_2_G2_Top50.txt
Task_3_Speculations.txt		Task_3_Speculations.txt
Test_Graph.txt		Test_Graph.txt
outLinks.txt		outLinks.txt
outlnks_graph_creator.py		outlnks_graph_creator.py
wiki_urls_Task1.txt		wiki_urls_Task1.txt