Responsible crawling with Colly. For the better Internet.
Based on lessons learned while writing Idun and subsequently getting banned by half of the website operators...
- HTTP status code 429
- HREF REL NOFOLLOW
- robots.txt
- actual delay between requests
- URL tests (i.e. extension, domain, etc.)
- Max run time