Skip to content

Returns 200 and success even when the URL requested is down or offline the second time [Cache issue?] #26

Open
@kringo

Description

Hello,

First of all great work, creating a declarative scraper. We were testing out worker using docker, ran using

docker run -d -p 8080:8080 montferret/worker

and it's running great. We sent a POST request to above with below payload and got 200 OK which is good.

{ "text": "LET doc = DOCUMENT(@url, { driver: \"cdp\", userAgent: \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome 76.0.3809.87 Safari/537.36\"}) RETURN {}", "params": { "url": "http://192.168.0.10/test" } }

However the problem is, when the URL is down/offline (we intentionally took site http://192.168.0.10/test down) and we're still getting the same 200 and OK. Looks like the previously successful request is cached since http://192.168.0.10/test was running when the very 1st time request went through. [if we restart docker container while http://192.168.0.10/test is down and send a new fresh request, it's showing net::ERR_ADDRESS_UNREACHABLE as expected and working correctly]

Not sure if this is due to Chrome caching or ferret caching it?

If it is cache, is there a way to disable the cache so that every time it hits the live URL instead of using the cache version?

If there is a flag how to pass it to the docker image?

Appreciate your help, thanks in advance.

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions