Skip to content

Latest commit

 

History

History
63 lines (55 loc) · 3.19 KB

README.md

File metadata and controls

63 lines (55 loc) · 3.19 KB

Project Proposal



Problem Definition -


Given a few number of interlinked web pages, we need to sort the pages according to relevance with a given query. The whole idea is 
to create a mini search engine based on Google’s Page Rank Algorithm in this project.

Deliverable(s)


The project would broadly involve five steps:
	• Making and Storing web Pages
	• Searching (Search Engine)
	• Adjacency Matrix formation
	• Rank Calculation(Eigen Analysis)
	• Sorting of web pages and showing the results.

Input(s)
	- Webpages (HTML) which contains links of another pages.
	- Query related to those pages
Output(s)
	- List of relevant pages according to their relativity of given query.

Basic Philosophy:


Just like a research paper is more important because it has lots of citations, a web page is also more important if lots of other 
web pages refer to it i.e. has outlink to the page. The importance of a scholarly article is more if it has citation from other 
important articles similarly a page with outlinks from other important pages has higher important than those which does not. Further 
we say that if a page has n outlinks, it distributes all of its’ importance equally to all the pages it is referring to. So it is 
like a democritic system in which pages are voting for other pages by giving equal part of their vote to all pages which are being 
referred by it. The value of vote describe importance of a page.

Implementation:


In this section we will discuss about the implementation procedure that we followed in the environment we worked.
	• For implementation of PageRank we are using LAMP and GNU Octave.
	• We are storing pages, related keywords and corresponding outlinks.
	• It will take keywords from query and find related pages then fetch pageid and outlinks.
	• Then it will make adjacency matrix.
	• Using this adjacent matrix create new link matrix.
	• Check for dangling nodes then handle those nodes by using PageRank formula which is given above.
	• Then it will find eigenvector correspoing to dominant eigenvalues.
	• According to eigenvector it calculate rank vector.
	• Show pages accordingly.

Contribution:


The following table gives the details of contribution of each member:
Members’ Name ID Contribution
Dilip Puri 201351014 Implementation(coding) and Presentation
Abhijeet Singh 201351005 Implementation
Chirag Panpalia 201351001 Problem study and algorithm development for implementation
Sonu Patidar 201351016 Problem study and algorithm development for implementation, report preparation