Skip to content

Multiple crawler instances share useState state #3024

@barjin

Description

@barjin

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/basic (BasicCrawler)

Issue description

When instantiating multiple crawler instances at once, their useState methods (both on the crawler instance and in the requestHandler context param) will always resolve to the same state.

From the API, this is not expected (crawler.useState feels like it should resolve to internal crawler state). If it is, it IMO requires better docs.

Code sample

import { CheerioCrawler } from '@crawlee/cheerio';

async function main() {
    function createCrawler() {
        return new CheerioCrawler({
            requestHandler: async ({ request, useState }) => {
                const state = await useState<string[]>([]);
                state.push(request.url);
            },
        });
    }

    const [crawler1, crawler2] = [createCrawler(), createCrawler()];

    await crawler1.run(['https://example.com']);
    await crawler2.run(['https://example.org']);

    console.log(crawler1 === crawler2); // false
    console.log(await crawler1.useState() === await crawler2.useState()); // true
    console.log(await crawler1.useState()); //  ['https://example.com', 'https://example.org' ]
}

main();

Package version

3.13.8

Node.js version

Node 22

Operating system

Linux

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions