Discovered in WCC PR
but turns out it closes the page on error before errorHandler runs ... _handleNavigationTimeout which closes crawlingContext.page.close() is being called on any navigation error
on actor side errorHandler can't reliably interact with ctx.page after a navigation error because the page is already closed by the time it runs.
The culprit is BrowserCrawler._handleNavigationTimeout
|
protected async _handleNavigationTimeout(crawlingContext: Context, error: Error): Promise<void> { |
|
const { session } = crawlingContext; |
|
|
|
if (error && error.constructor.name === 'TimeoutError') { |
|
handleRequestTimeout({ session, errorMessage: error.message }); |
|
} |
|
|
|
await crawlingContext.page.close(); |
|
} |
despite the name, it's called on
every navigation error (timeout, network failure,
Download is starting, frame detached, ...) and unconditionally does
await page.close().
That close is also redundant: _cleanupContext already closes the page in the finally of _runRequestFunction. So the eager close just races the user's errorHandler (and, via activePages--, the browser pool's killer) for no real win.
Where eager close actually helps
Going through the cases:
TimeoutError — page is probably hung on resources, eager close releases them immediately. Genuinely useful.
net::ERR_* (DNS/refused/SSL/etc.) — nothing was loaded, nothing to release.
Download is starting — page is healthy, browser holds the captured Download.
- frame detached / page already closed —
_cleanupContext handles it idempotently anyway.
Only the timeout case justifies eager close.
Discovered in WCC PR
on actor side
errorHandlercan't reliably interact withctx.pageafter a navigation error because the page is already closed by the time it runs.The culprit is
BrowserCrawler._handleNavigationTimeoutcrawlee/packages/browser-crawler/src/internals/browser-crawler.ts
Lines 691 to 699 in d3a29d9
despite the name, it's called on every navigation error (timeout, network failure,
Download is starting, frame detached, ...) and unconditionally doesawait page.close().That close is also redundant:
_cleanupContextalready closes the page in thefinallyof_runRequestFunction. So the eager close just races the user'serrorHandler(and, viaactivePages--, the browser pool's killer) for no real win.Where eager close actually helps
Going through the cases:
TimeoutError— page is probably hung on resources, eager close releases them immediately. Genuinely useful.net::ERR_*(DNS/refused/SSL/etc.) — nothing was loaded, nothing to release.Download is starting— page is healthy, browser holds the capturedDownload._cleanupContexthandles it idempotently anyway.Only the timeout case justifies eager close.