This project contains an utility for generating sitemaps given a particular URL. There are two ways of executing this utility
NB: All CLI commands assume that you have navigated to the same directory as this README file
Follow the instructions here(https://docs.docker.com/engine/getstarted/step_one/) to install docker for your machine. On completion of the installation, open your CLI of choice and run docker --version
to verify that it is installed
In your CLI, make the script in install_crawler.sh
executable with the command chmod a+x install_crawler.sh
.
Execute the script using the command ./install_crawler.sh
Execute the following script within your CLI making sure to replace http://example.com
with the URL of the website you wish to crawl
docker run --rm -it \
-v $(pwd):/sitecrawler \
sitecrawler/composer:latest \
php -d memory_limit=-1 crawl.php http://example.com
The sitemap(consisting of the visited urls, links and assets) and assetmap(just urls and assets) can be found in the sitemaps/{url_of_the_crawled_website}
directory
This project is built with php, hence it requires you to have php(version 5.6) installed on your machine. When installation is complete, run php --version
from your CLI to verify that it is installed.
Follow the instructions here(https://getcomposer.org/download/) to download and install composer. When installation is complete, run composer --version
from your CLI to verify that it is installed.
Run composer install
in order to install the dependencies required by this utility. When installation is done, run composer dump-autoload
Use the command php -d memory_limit=-1 crawl.php http://example.com
to start crawling from the supplied URL.
Make sure to replace http://example.com
with a URL of your choosing.
The sitemap(consisting of the visited urls, links and assets) and assetmap(just urls and assets) can be found in the sitemaps/{url_of_the_crawled_website}
directory
(assumes you've followed STEP 1 in the Using Docker
section above)
- In your CLI, make the script in
test_crawler.sh
executable with the commandchmod a+x test_crawler.sh
. - Execute the script using the command
./test_crawler.sh
- View test coverage results by opening
build/coverage/index.html
in a browser of your choice
(assumes you've followed STEPS 1, 2 & 3 in the Without Docker
section above)
- From your CLI, run
vendor/bin/phpunit -c phpunit.xml
- View test coverage results by opening
build/coverage/index.html
in a browser of your choice