-
Notifications
You must be signed in to change notification settings - Fork 20
Is it possible to use chrome-prerender as a squid parent in a proxy sandwich setup? #35
Comments
This minimal patch in app.py was good enough for a poc. :)
|
Haven't used squid with chrome-prerender before, I am not sure what's wrong. Could you elaborate? |
I wanted to see the render quality of the chrome engine with my own eyes in a browser and do a test spider run on an existing angular web spa with an old school tool, httrack, in order to see if the whole angular app is crawlable. For both i needed a regular proxy api interface, so i have put a squid proxy in front of prerender. The Squid is configured to eat all static file requests directly and sends the rest of the requests to his chrome-prerender parent proxy. Squid as a proxy client sends different than expected requests to his "parent proxy", in this case prerender, so i had to make prerender understand these. [2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11 Worked by replacing if not parsed_url.hostname: with if not parsed_url.hostname: |
Would you like to send a PR to fix it? |
First: Thx a lot for this great piece of software! Second: I am just a dino admin, would be my first PR here. And what i did was just a crude hack job, should be done the right way by some coder more competent than me in order to minimize potential side effects! :) If someone wants to do this and needs to configure a squid proxy for testing, this is the squid.conf i used (relevant parts are the cache_peer directive, squid runs locally on the same vm as prerender, and the "direct acls" named "static" and "direct"): [root@prerender ~]# cat /etc/squid/squid.conf
|
I would like to use chrome prerender in a proxy sandwich configuration (cache as much as possible), but squid as a client uses different GET requests. Ideas what to configure where, anyone?
Curling works fine:
[2017-12-28 17:33:27 +0100] - (sanic.access)[INFO][1:2]: GET http://127.0.0.1:3000/http://www.nytimes.com/ 200 446977
2017-12-28 17:33:27,944 INFO sanic.access.log_response:325
Squid fails:
[2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11
2017-12-28 17:34:06,510 INFO sanic.access.log_response:325
[2017-12-28 17:34:11 +0100] [23436] [INFO] KeepAlive Timeout. Closing connection.
2017-12-28 17:34:11,510 INFO root.keep_alive_timeout_callback:193 KeepAlive Timeout. Closing connection.
The text was updated successfully, but these errors were encountered: