Skip to content

Saving and resuming crawls

Péter Bencze edited this page May 31, 2019 · 4 revisions

Save and resume crawling sessions

Sometimes, for large sites, it is desirable to save crawls and be able to resume them later.

Obtain state of the crawler

Use the getState method to obtain the current state of the crawler. This state object can be serialized and later be used to restore the crawler to that state.

Resume state of the crawler

There is a constructor which takes a state object as argument. Once the crawler is recreated from the state object, use one of the resume methods to resume the crawl.

Example

// Load previously saved state from somewhere
// CrawlerState previousState = ...

// Resume crawl
MyCrawler crawler = new MyCrawler(previousState);
crawler.resume();