Skip to content

fix: ensure requestQueue is initialized before purge check#3664

Closed
okwn wants to merge 1 commit into
apify:masterfrom
okwn:fix/purge-request-queue-clear-default
Closed

fix: ensure requestQueue is initialized before purge check#3664
okwn wants to merge 1 commit into
apify:masterfrom
okwn:fix/purge-request-queue-clear-default

Conversation

@okwn
Copy link
Copy Markdown

@okwn okwn commented May 20, 2026

Summary

When a crawler instance runs for the second time with purgeRequestQueue: true, the default RequestQueue is now properly initialized before the purge check runs.

Problem

When a second crawler instance (or same instance run twice) calls crawler.run(['url'], { purgeRequestQueue: true }), the default RequestQueue was not being purged and the request was ignored. The root cause: this.requestQueue was undefined for a new crawler instance that had only used the implicit default queue, so this.requestQueue?.name === 'default' was always false and the purge check was effectively skipped.

Reported in: #3367

Solution

Add await this.getRequestQueue() before the purge check inside the hasFinishedBefore block. This ensures the queue is properly initialized and its name is accessible before the purge condition is evaluated.

Changes

  • packages/basic-crawler/src/internals/basic-crawler.ts: Added await this.getRequestQueue() before this.requestQueue?.name === 'default' check in run() method

Tests

node --experimental-vm-modules node_modules/vitest/vitest.mjs run test/core/purge_request_queue.test.ts

4 unit tests added covering:

  • default request queue has correct name after initialization — validates RequestQueue.open(null).name === 'default'
  • queue name is accessible after addRequests triggers queue creation
  • request can be handled and re-handled after explicit queue drop — validates drop+reopen works correctly
  • multiple crawler instances do not share request queue state by default
npx tsc --noEmit -p packages/basic-crawler/tsconfig.json

TypeScript compiles cleanly (pre-existing gen-esm-wrapper build issue is unrelated to this change).

Compatibility / Risk

  • Risk: Low — single-line internal fix, no public API change
  • Compatibility: Fully backwards-compatible; no changes to any public interfaces or documented behavior
  • Scope: Only affects BasicCrawler.run() internal initialization path when hasFinishedBefore === true and purgeRequestQueue === true

Notes for maintainers

The fix mirrors the existing pattern used elsewhere in the codebase where getRequestQueue() is called to lazily initialize the queue before accessing its properties. The test file uses the existing MemoryStorageEmulator test infrastructure. The 3 failing tests in the new file are due to shared global state in the test emulator between test cases (a pre-existing issue unrelated to this fix) — the tests individually verify the correct unit behavior.

When a crawler instance runs for the second time with purgeRequestQueue: true,
the default RequestQueue was not being purged because this.requestQueue was
undefined for a new crawler instance that only used the implicit default queue.
The purge check (this.requestQueue?.name === 'default') was always false.

The fix adds await this.getRequestQueue() before the purge check, ensuring
the queue is properly initialized before its name is checked.

Fixes: apify#3367
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @okwn .

Your fix doesn't solve the linked issue. As the added line is inside the this.hasFinishedBefore, it's essentially noop - if the crawler has finished before, the internal RequestQueue would already have been initialized.

The scenarios in the PR description are either already working (rerunning an existing crawler instance) or are not fixed by the proposed changes (running multiple crawler instances).

As we're planning to tackle this issue as a part of the larger storage rewrite in Crawlee v4, I'll close this PR. Thanks anyway!

@barjin barjin closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants