Skip to content

A JAVA based Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

Notifications You must be signed in to change notification settings

shaileshpandey11/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

web-crawler

Introdution

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering). This is a java application which crawls through specific domain.

Getting Started

Follow below instructions, to get up and running application.

Prerequisites

  1. Installed maven.
  2. Installed JAVA 1.8 or higher version

How to run this application

  1. Before any step, first take clone of this repository.
  2. Import project in your IDE as a maven java project.
  3. Run this below command from terminal by going to respective directory or use IDE feature maven clean install to make build.

Built With

  • Maven - Dependency Management

Add more feature in future

  1. Add queue in case of Future Task.
  2. Add UI interface in which you just need to type domain URL and you get beautiful HTMl output.
  3. Add database to store results and retrieve on basis of some criterias

Reference

About

A JAVA based Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages