Description
The courses now recommend using variety of tools to make HTTP requests. Sometimes it's more confusing, sometimes less.
- The got library seems to be superseeded by ky, at least in the README Sindre mentions it.
- Apify develops got-scraping, which seems to be married to the
got
library. Wouldn't it make sense to turngot-scraping
into something more agnostic to a client library? Could it just prepare the request details, so that they can be attached to the request by any library? - Some guides use axios
- Some guides mention request and request-promise, which are now both deprecated
- Meanwhile, Node.js has adopted fetch to the stdlib
Using got-scraping
in the basic tutorial is probably unnecessary, any HTTP client can be used in the initial lessons. The value of got-scraping
should emerge with more complicated use cases.
But wouldn't it make more sense to skip got-scraping
and promote Crawlee right away at that point? Is got-scraping
something Apify wants to spend marketing energy on, or is it an implementation detail?
As of now, got-scraping
doesn't have good Python alternatives I'd know about. There are independent libraries one can use, such as fake-user-agent, which have integrations with scraping frameworks.
Regarding request libraries, the scene is similarly shattered in Python, featuring requests, aiohttp, or httpx, each having their fans and use cases.
I'd like to kick off this as a discussion on what should be the preferred way for the Academy to teach making requests in 2024, using Node.js and Python.