-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
This issue when running selenium-wire
in scrapy cloud is caused because of a call to socket.setdefaulttimeout
in sh_scrapy.crawl
.
The related issue in pyOpenSSL
pyca/pyopenssl#168
The cause of the issue is this call
https://github.com/scrapinghub/scrapinghub-entrypoint-scrapy/blob/master/sh_scrapy/crawl.py#L27
A minimal example to reproduce the issue
Clone https://github.com/pawelmhm/quotesbot/tree/selenium
Working case
cd quotesbot
docker built -t selenium-wire-scrapy-issue .
docker run selenium-wire-scrapy-issue scrapy crawl toscrape-css
It should finish properly scraping 100 items and no errors
Now, update settings.py
and add the following to the top of the file
import socket
socket.setdefaulttimeout(60)
Then rebuild and rerun the container
docker built -t selenium-wire-scrapy-issue .
docker run selenium-wire-scrapy-issue scrapy crawl toscrape-css
Now it should fail, scraping only 10 items and showing an error net::ERR_SSL_PROTOCOL_ERROR
in the logs
Metadata
Metadata
Assignees
Labels
No labels