This is a showcase for a distributed website crawler using one producer and multiple workers. The basic idea is to use RMQ's worker queue to schedule tasks for scraping/parsing many pages by running multiple workers simultaneously.
The input for this project is the UK Area Codes website:
http://www.area-codes.org.uk/full-uk-area-code-list.php
The worker crawlers scrape the passed url's passed by the producer and parse the city/town name along with their area codes. The workers run in parallel.
To run the code you need to setup RabbitMQ and also install pika, requests and BeautifulSoup Python libraries.
conda create -n crawler python=3.7.2 anaconda
conda activate crawler
pip install requests
pip install BeautifulSoup4
pip install pika
brew update
brew install rabbitmq
brew services start rabbitmq
python producer.py
python worker.py
conda deactivate