-
Notifications
You must be signed in to change notification settings - Fork 773
Open
Labels
Description
Describe the bug
I have the following error when I try to scrape reddit :
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
I also tried with the python package and subreddit search but it doesn't work either.
I tried to do it from another device but the same result...
Any idea?
How to reproduce
run snscrape -n 100 -vv reddit-search toto
Expected behaviour
Get data?
Screenshots and recordings
No response
Operating system
Kubuntu 22.04
Python version: output of python3 --version
3.8.8
snscrape version: output of snscrape --version
snscrape 0.7.0.20230622
Scraper
reddit-search
How are you using snscrape?
CLI (snscrape ... as a command, e.g. in a terminal)
Backtrace
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
Log output
2023-07-05 09:56:59.215 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:56:59.216 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:56:59.216 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:56:59.217 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): api.pushshift.io:443
2023-07-05 09:56:59.285 DEBUG snscrape.base Connected to: ('172.67.219.85', 443)
2023-07-05 09:56:59.285 DEBUG snscrape.base Connection cipher: ('TLS_AES_256_GCM_SHA384', 'TLSv1.3', 256)
2023-07-05 09:56:59.682 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:56:59.684 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:56:59.684 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:56:59 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=y4%2BHEpMTSBJTXPGm4t7j95SB7FGvFVoPOkhN7%2BoPzIMt8rFnrbVatyYC2TKIviyCyOuaYt%2B%2FtN02NPN3AZa%2BCtunP7oatjwYM8k51iOBRkXNrBcTndwFIxVJTfEqILZlwQTp"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0df6bb3ad4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:56:59.684 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:56:59.684 INFO snscrape.base Waiting 1 seconds
2023-07-05 09:57:00.687 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:00.687 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:00.688 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:00.809 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:00.810 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:00.811 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=RRI7V4%2FKORopA2%2FQFWbrrUnkFlm%2Ftd5O9SismrizB9mCRFBeF2tTFM0L%2FhbJTzPPwHYyQiOZ6ZzhjUyUc%2BkSPQla5B1BqN%2BTV3LcE2%2Fv3y9Q%2FYeQHPp6gIGrjqjfaDO8dRC3"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0dff78f0d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:00.811 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:57:00.811 INFO snscrape.base Waiting 2 seconds
2023-07-05 09:57:02.815 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:02.815 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:02.815 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:02.938 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:02.938 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:02.939 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=iwHMuR85a9T6e4AsOzZ3nlYUMI4G2ke71fL7PEhrcNRyy%2BUhlTw9OhJgogU4NAWUKAY1gXhPNQgoSAZSct65B2fLZviQvfVhJwWAS7EWe%2BG0jcjKm4ot9p11cAMDQQQLmJ3P"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0e0cc998d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:02.939 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:57:02.939 INFO snscrape.base Waiting 4 seconds
2023-07-05 09:57:06.945 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:06.945 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:06.945 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:07.066 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:07.067 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:07.067 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:07 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=sbdEsUwu7UnHrollCV0oSOt0FUSXPBvUgjqRiWXSV0A%2BNpdcPvsXdaETxaF8GYBdD0k02i5vWa8sK%2FnZnSCNU5T0VPs3FMTx5yhC7E9LkDFzczUz5ZkXmrzoHoN4%2FcQEJYqI"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0e2699d8d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:07.067 ERROR snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code
2023-07-05 09:57:07.067 CRITICAL snscrape.base 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
2023-07-05 09:57:07.067 CRITICAL snscrape.base Errors: non-200 status code, non-200 status code, non-200 status code, non-200 status code
2023-07-05 09:57:07.118 CRITICAL snscrape._cli Dumped stack and locals to /tmp/snscrape_locals_j8mi7h4g
Traceback (most recent call last):
File "/home/matthieu-inspiron/anaconda3/bin/snscrape", line 8, in <module>
sys.exit(main())
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/_cli.py", line 323, in main
for i, item in enumerate(scraper.get_items(), start = 1):
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 219, in get_items
yield from self._iter_api_submissions_and_comments({type(self)._apiField: self._name})
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 185, in _iter_api_submissions_and_comments
tipSubmission = next(submissionsIter)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 143, in _iter_api
obj = self._get_api(url, params = params)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 94, in _get_api
r = self._get(url, params = params, headers = self._headers, responseOkCallback = self._handle_rate_limiting)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/base.py", line 275, in _get
return self._request('GET', *args, **kwargs)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/base.py", line 271, in _request
raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
Dump of locals
I prefer to send it in private
Additional context
No response