-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BookKeeper switched to readonly after FileInfoDeletedException #1919
Comments
@athanatos @jvrao @reddycharan I think that this exception comes from one of your recent commits. Have you already seen this error? Cc @sijie |
@eolivelli Haven't seen it. FileInfoDeletedException was added with the local consistency checker to indicate a call on a FileInfo after the gc had removed the ledger. In this case, I think SortedLedgerStorage is flushing entries for a ledger which no longer exists. What exception should be propagated in that case? |
Thank you for checking The main problem is that bookie switched to readonly mode. We are using official 4.7.3 release. |
@eolivelli Yeah, I was wrong on that one, I apparently added that bit to address a different race condition and later reused it. I think the bug is that grabCleanPage shouldn't be able to obtain a page in that state, I'm taking a look. |
@athanatos thank you |
@eolivelli How many times has this occurred? |
I am leaving the answer to @aluccaroni |
Looks like EntryMemTable.flushSnapshot will tolerate a NoLedgerException, but prior to my patch I think that this race would have resulted in a ChannelClosedException in which case it would still have transitioned to RO. I think the right answer is for putEntryOffset to translate the FileInfoDeletedException into a NoLedgerException. |
@athanatos we have seen it only once in 3 weeks (since we have put the v4.7.3 in production) |
@athanatos are you working on a fix for this? |
Yeah, sorry, was on vacation last week. I've got a patch I'll put up today. |
BUG REPORT
During normal operation of a cluster of 3 BookKeepers one switched to READONLY mode. Usually we see this kind of errors when a disk become full, but this time we found out a "FileInfoDeletedException" inside the logs. We restarted the Bookie and everything returned to normal.
Apache BookKeeper 4.7.3
Java 11.0.2+7
What did you do?
n/a
What did you expect to see?
no error/no readonly mode
What did you see instead?
The Bookie switched to readonly mode
See stacktrace inside org.apache.bookkeeper.bookie.SortedLedgerStorage
19-01-30-09-48-40 org.apache.bookkeeper.bookie.FileInfo$FileInfoDeletedException: FileInfo already deleted org.apache.bookkeeper.bookie.FileInfo$FileInfoDeletedException: FileInfo already deleted at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:248) at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:242) at org.apache.bookkeeper.bookie.FileInfo.size(FileInfo.java:342) at org.apache.bookkeeper.bookie.IndexPersistenceMgr.updatePage(IndexPersistenceMgr.java:643) at org.apache.bookkeeper.bookie.IndexInMemPageMgr.grabLedgerEntryPage(IndexInMemPageMgr.java:470) at org.apache.bookkeeper.bookie.IndexInMemPageMgr.getLedgerEntryPage(IndexInMemPageMgr.java:435) at org.apache.bookkeeper.bookie.IndexInMemPageMgr.putEntryOffset(IndexInMemPageMgr.java:594) at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:96) at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:433) at org.apache.bookkeeper.bookie.SortedLedgerStorage.process(SortedLedgerStorage.java:184) at org.apache.bookkeeper.bookie.EntryMemTable.flushSnapshot(EntryMemTable.java:251) at org.apache.bookkeeper.bookie.EntryMemTable.flush(EntryMemTable.java:205) at org.apache.bookkeeper.bookie.SortedLedgerStorage$1.run(SortedLedgerStorage.java:213) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
The text was updated successfully, but these errors were encountered: