fix: ensure requestQueue is initialized before purge check#3664
Closed
okwn wants to merge 1 commit into
Closed
Conversation
When a crawler instance runs for the second time with purgeRequestQueue: true, the default RequestQueue was not being purged because this.requestQueue was undefined for a new crawler instance that only used the implicit default queue. The purge check (this.requestQueue?.name === 'default') was always false. The fix adds await this.getRequestQueue() before the purge check, ensuring the queue is properly initialized before its name is checked. Fixes: apify#3367
barjin
reviewed
May 22, 2026
Member
barjin
left a comment
There was a problem hiding this comment.
Thank you for your contribution @okwn .
Your fix doesn't solve the linked issue. As the added line is inside the this.hasFinishedBefore, it's essentially noop - if the crawler has finished before, the internal RequestQueue would already have been initialized.
The scenarios in the PR description are either already working (rerunning an existing crawler instance) or are not fixed by the proposed changes (running multiple crawler instances).
As we're planning to tackle this issue as a part of the larger storage rewrite in Crawlee v4, I'll close this PR. Thanks anyway!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a crawler instance runs for the second time with
purgeRequestQueue: true, the defaultRequestQueueis now properly initialized before the purge check runs.Problem
When a second crawler instance (or same instance run twice) calls
crawler.run(['url'], { purgeRequestQueue: true }), the defaultRequestQueuewas not being purged and the request was ignored. The root cause:this.requestQueuewas undefined for a new crawler instance that had only used the implicit default queue, sothis.requestQueue?.name === 'default'was alwaysfalseand the purge check was effectively skipped.Reported in: #3367
Solution
Add
await this.getRequestQueue()before the purge check inside thehasFinishedBeforeblock. This ensures the queue is properly initialized and its name is accessible before the purge condition is evaluated.Changes
packages/basic-crawler/src/internals/basic-crawler.ts: Addedawait this.getRequestQueue()beforethis.requestQueue?.name === 'default'check inrun()methodTests
4 unit tests added covering:
default request queue has correct name after initialization— validatesRequestQueue.open(null).name === 'default'queue name is accessible after addRequests triggers queue creationrequest can be handled and re-handled after explicit queue drop— validates drop+reopen works correctlymultiple crawler instances do not share request queue state by defaultTypeScript compiles cleanly (pre-existing
gen-esm-wrapperbuild issue is unrelated to this change).Compatibility / Risk
BasicCrawler.run()internal initialization path whenhasFinishedBefore === trueandpurgeRequestQueue === trueNotes for maintainers
The fix mirrors the existing pattern used elsewhere in the codebase where
getRequestQueue()is called to lazily initialize the queue before accessing its properties. The test file uses the existingMemoryStorageEmulatortest infrastructure. The 3 failing tests in the new file are due to shared global state in the test emulator between test cases (a pre-existing issue unrelated to this fix) — the tests individually verify the correct unit behavior.