- Option to set `respect_robots_txt` (Default should be `True` because of legal obligation in some jurisdictions) - Fetch and Parse robots.txt (`urllib.robotparser` will helps in parsing robots.txt) - Create crawl rule per domain - Check URL permissions before crawling a URL - Make sure it works when concurrent workers are fetching different domains - Use the rules provided in robots.txt to fetch. (eg: use robots.txt `crawl-delay` if present. Check the rule before crawling a path)