Skip to content

cryptixcoder/PyCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

PyCrawler is very simple to use. It takes 5 arguments:

1) database file name: The file that that will be used to store information as a sqlite database. If the filename given does not exist, it will be created.

2) start url: Crawlers need a place to start! This should be a valid url.
   ex. http://www.mysite.com/

3) crawl depth: This should be the number of pages deep the crawler should follow from the starting url before backing out.

4) url regex (optional): A regex to filter the URLs. If not set then all URLs will be be logged.

5) verbose (optional): If you want PyCrawler to spit out the urls it is looking at, this should be "true" if it is missing, or has any other value, it will be ignored and considered false.

About

A python web crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published