Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stopping the crawlers gracefully with BasicCrawler.stop() #2792

Merged
merged 12 commits into from
Jan 20, 2025
Prev Previous commit
Next Next commit
feat: make BasicCrawler.stop synchronous (parity w/ Python)
  • Loading branch information
barjin committed Jan 6, 2025
commit 55210ecf9b49ed0d4c304396481652be0f37f010
15 changes: 9 additions & 6 deletions packages/basic-crawler/src/internals/basic-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -977,13 +977,16 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext
/**
* Gracefully stops the current run of the crawler.
*
* This method will wait for all running tasks to finish. Only once all tasks are finished, the method will resolve.
*
* **WARNING:** If this method is called (and awaited) from a task (e.g. in the `requestHandler`), it will wait indefinitely, as the task will never finish.
* All the tasks active at the time of calling this method will be allowed to finish.
*/
async stop() {
await this.autoscaledPool?.pause(); // Gracefully starve the this.autoscaledPool, so it doesn't start new tasks. Resolves once the pool is cleared.
await this.autoscaledPool?.abort(); // Resolves the `autoscaledPool.run()` promise in the `BasicCrawler.run()` method. Since the pool is already paused, it resolves immediately and doesn't kill any tasks.
stop(message: string = 'This crawler has been gracefully stopped.') {
barjin marked this conversation as resolved.
Show resolved Hide resolved
this.autoscaledPool
?.pause() // Gracefully starve the this.autoscaledPool, so it doesn't start new tasks. Resolves once the pool is cleared.
.then(async () => this.autoscaledPool?.abort()) // Resolves the `autoscaledPool.run()` promise in the `BasicCrawler.run()` method. Since the pool is already paused, it resolves immediately and doesn't kill any tasks.
.then(() => this.log.info(message))
.catch((err) => {
this.log.error('Error stopping the crawler:', err);
barjin marked this conversation as resolved.
Show resolved Hide resolved
});
}

async getRequestQueue() {
Expand Down