Skip to content

Accelerate urls checking using concurrency (asyncio + aiohttp) #50

Open
@SuperKogito

Description

@SuperKogito

Urls are checked using a loop that tests the response of the requests sequentially, which becomes slow for huge websites.


Img source

Alternatively, we can use concurrency to process requests& responses asynchronously and speed up the system.


Img source

I already integrated this concept in my local repo using the asyncio and AIOHTTP libraries and the results look promising. The speed difference is notable based on various blogs (Python and fast HTTP clients, HTTP in Python: aiohttp vs. Requests, Making 1 million requests with python-aiohttp) and so far my tests confirm that.

Img source

The new libraries are slightly different from requests and so the following is true:

  • Different requests response format.
  • Different exceptions from previous ones.
  • Some disorder in the printed urls list (asynchronicity).
  • Need to remove duplicate urls before checking instead of adding urls to a seen set .
  • Possibly different timeout value needed.

I managed to almost replicate the same features we have in the current version but I will definitely need your feedback. Anyway, these differences bring me to my major question @vsoch : Do you think that we should add this feature as an option --accelerated-run or replace the current implementation with it

Metadata

Metadata

Assignees

Labels

discussionDiscussing features, implementations and enhancementsenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions