You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In production, outputting log files to Gluster, we occasionally see problems with log file rotation that lead to checkpoint failure (as mentioned in #392).
SEVERE: org.archive.crawler.framework.CheckpointService checkpointFailed Checkpoint failed [Fri May 28 10:28:03 GMT 2021]
java.io.IOException: Unable to move /heritrix/output/frequent-npld/20210519154706/logs/crawl.log to /heritrix/output/frequent-npld/20210519154706/logs/crawl.log.cp00032-20210528102802
at org.archive.io.GenerationFileHandler.rotate(GenerationFileHandler.java:127)
at org.archive.crawler.reporting.BufferedCrawlerLoggerModule.rotateLogFiles(BufferedCrawlerLoggerModule.java:331)
at org.archive.crawler.reporting.BufferedCrawlerLoggerModule.doCheckpoint(BufferedCrawlerLoggerModule.java:393)
at org.archive.crawler.framework.CheckpointService.requestCrawlCheckpoint(CheckpointService.java:285)
...
Prior to the checkpoint failure, there are missing log files, e.g. no crawl.log / alerts.log. Instead, usually, the checkpoint-version of the file is still being written to, e.g. crawl.log.cp00011-xxxxx. In rare cases, the underlying Java logger FileHandler rotation appears to have kicked in, because we found a crawl.log.1 file that was being written to.
This is presumably some kind of threading/race-condition in GenerationalFileHandler, perhaps brought on by Gluster occasionally blocking when performing file operations. Unfortunately, this is just a guess, and I don't know how to reproduce this error.
The text was updated successfully, but these errors were encountered:
In production, outputting log files to Gluster, we occasionally see problems with log file rotation that lead to checkpoint failure (as mentioned in #392).
Prior to the checkpoint failure, there are missing log files, e.g. no
crawl.log
/alerts.log
. Instead, usually, the checkpoint-version of the file is still being written to, e.g.crawl.log.cp00011-xxxxx
. In rare cases, the underlying Java logger FileHandler rotation appears to have kicked in, because we found acrawl.log.1
file that was being written to.This is presumably some kind of threading/race-condition in GenerationalFileHandler, perhaps brought on by Gluster occasionally blocking when performing file operations. Unfortunately, this is just a guess, and I don't know how to reproduce this error.
The text was updated successfully, but these errors were encountered: