GitHub - aadithpm/crawler-swarm: Simple Python web crawler

Concurrent web crawler in Python 3. Powered by BeautifulSoup and requests. Crawls a website for all non-relative links and queues those links for further crawling until a user-defined 'level' of crawling is reached (or you get bored of waiting for the crawler to finish).

Setup:

Clone repo locally: git clone REPO_URL
Optional: Create a virtual environment: python -m venv env
Install requirements: pip install -r requirements.txt
Running unittests: python -m unittest
Running the crawler:

python crawler.py [--help] --url=URL --levels=LEVELS --indent

help - displays help message
url - URL to crawl
levels - maximum levels to crawl from starting URL
indent - print with/without indentation, defaults to unindented

Preview:

Improvements:

Visit only unique links
- For 'parent' and 'child' links, do not visit a link that has already been visited. For example, social links are available on every page of a company's website. Resources are wasted by queueing these for crawling when they will be a part of the crawl queue on the first instance anyway
Present links on a different interface
- A console application is the fastest to develop but a web application might present this information in a more organized and navigable format
Tests
- Can never have enough tests. crawler_swarm's crawler processing logic does not have tests at all, for example.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
crawler.py		crawler.py
crawler_swarm.py		crawler_swarm.py
main.py		main.py
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup:

Preview:

Improvements:

About

Uh oh!

Releases

Packages

Languages

aadithpm/crawler-swarm

Folders and files

Latest commit

History

Repository files navigation

Setup:

Preview:

Improvements:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages