Skip to content

Browser breaks with new tab on empty content response #288

Closed
@d-balaskas

Description

Hello world!

I am building a common Spider that crawls sites and the contained request.
I use scrapy-playwright to load websites first and get the requests that are sent.

I noticed that when I parse urls that have no content on body the execution freezes and playwright's browser shows empty tab.
To be clear reproduction of the problem is when parse a url that has the following condition as true:

response_body_text = await response.text()
response_body_text  == ''

For the urls that this condition is false spider works perfectly!

For the reproduction, I have a quite common configuration with:

CrawlerProcess({
        ...
        # Playwright settings
        'DOWNLOAD_HANDLERS': {
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        'TWISTED_REACTOR': "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        'PLAYWRIGHT_BROWSER_TYPE': 'chromium',
        'PLAYWRIGHT_MAX_PAGES_PER_CONTEXT': 10,
        'PLAYWRIGHT_LAUNCH_OPTIONS': {
            'headless': True,
        }        
})

and on each scrapy.Request() I pass the following meta:

{
            "playwright": True
}

Has anybody else come up with this issue?

Thank you all!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions