Description
Urls are checked using a loop that tests the response of the requests sequentially, which becomes slow for huge websites.
Alternatively, we can use concurrency to process requests& responses asynchronously and speed up the system.
I already integrated this concept in my local repo using the asyncio and AIOHTTP libraries and the results look promising. The speed difference is notable based on various blogs (Python and fast HTTP clients, HTTP in Python: aiohttp vs. Requests, Making 1 million requests with python-aiohttp) and so far my tests confirm that.
Img source
The new libraries are slightly different from requests and so the following is true:
- Different requests response format.
- Different exceptions from previous ones.
- Some disorder in the printed urls list (asynchronicity).
- Need to remove duplicate urls before checking instead of adding urls to a seen set .
- Possibly different timeout value needed.
I managed to almost replicate the same features we have in the current version but I will definitely need your feedback. Anyway, these differences bring me to my major question @vsoch : Do you think that we should add this feature as an option --accelerated-run
or replace the current implementation with it