-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feed: stockxapi rate limit #1
Comments
b51c779 should address this. It appears after being blocked from one get_details the next auth still fails indefinitely without human intervention. Need more investigation. With aggressive throttling we can mostly get through the current list. When we eventually get stuck after all the AJs it appears manual cookies reloading helps.
They use a thirdparty solution called PerimeterX. Need some targeted research. |
There does not appear to be an easy fix for this PerimeterX thing. Selenium was not able to help click that button: when simulating a Selenium click more checks from reCaptcha popped up. As a start, we should make sure to not duplicate queries. Then we should consider spreading our requests out more. |
Note that the API endpoint we are using is not what they have in the official repo (https://github.com/stockx/PublicAPI). Not knowing PerimeterX's mechanism the best thing to try now could be having a fleet of IP addresses and activating them throughout different times of day. |
It would appear throttle time and different logins don't seem to help.
Each item right now is 4 requests. We could try
This is an IP-based block as once 403'ed, other devices behind the same NAT also need to go through captcha. |
We were not blocked in last scrape on 07/27. Presumably this was lifted? Closing for now. |
This is observed again since feedv2 on 20191222. |
This is observed in both update and query modes. The current workaround are shell scripts to limit how many we update each time. If we breach such we become temporarily unavailable for about 30min, no human intervention needed. One problem is a script may never finish updating everything due to limit's interaction with requests that didn't error out due to 403. |
The problem has since been addressed and 403 on stockx no longer seems to be a major blocker. |
stockx appears to be one of those sites constantly upgrading their anti-bot mechanism.
On 06/02/19 my auth requests get through if they have User-Agent set.
On 06/09/19 I had to add Referrer, Origin and Content-Type.
On 06/12/19 I had to add these to get_details requests as well, and I still get 403 after the first few requests. As a short term solution perhaps a rate limit, or multiple sources, will do.
Goal of this is to be able to continue scraping stockx uninterruptedly. I can think of
I believe ultimately they want people to use their api but what I'm doing now is probably too brutal.
@djian618
The text was updated successfully, but these errors were encountered: