A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering). This is a java application which crawls through specific domain.
Follow below instructions, to get up and running application.
- Installed maven.
- Installed JAVA 1.8 or higher version
- Before any step, first take clone of this repository.
- Import project in your IDE as a maven java project.
- Run this below command from terminal by going to respective directory or use IDE feature
maven clean installto make build.
- Maven - Dependency Management
- Add queue in case of Future Task.
- Add UI interface in which you just need to type domain URL and you get beautiful HTMl output.
- Add database to store results and retrieve on basis of some criterias