-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bboltcachestorage: mitigate corrupt boltdb cache after panic #4981
Conversation
solver/bboltcachestorage/storage.go
Outdated
// fallbackOpenDB performs the panic recovery, database backup, and | ||
// opening of the new database file when a panic happens from safeOpenDB. | ||
func fallbackOpenDB(dbPath string, db **bolt.DB, err *error) { | ||
r := recover() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only call recover()
inside inline defer. It shouldn't be possible to reach it by calling a regular function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the recover portion to an inline defer. The fallbackOpenDB
is now just printing the log message, performing the move, and opening the new database.
solver/bboltcachestorage/storage.go
Outdated
} | ||
|
||
// initStore initializes the store with the required buckets. | ||
func initStore(db *bolt.DB) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this moved? Doesn't seem to have an effect on recover/defer logic and called only once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thought it would make things easier to read while I was modifying things. I can revert it if you prefer but I thought it would be easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a long function (yet) and only has one caller so I don't think there is a need to slice it. There is also no scope difference as "new" and "init" mean the same thing, just with a different word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this back. It's not the primary purpose of this PR and the initialization isn't really that complicated anyway.
solver/bboltcachestorage/storage.go
Outdated
// safeOpenDB opens a bolt database and recovers from panic that | ||
// can be caused by a corrupted database file. | ||
func safeOpenDB(dbPath string) (db *bolt.DB, err error) { | ||
defer fallbackOpenDB(dbPath, &db, &err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are already adding a fallback then it should cover all the possible cases where boltdb finds a corrupted library that it can't open. Fallback is cheap and can't happen for regular reasons. Don't want to miss out something and wait for another release cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified this to also apply to any error that occurs when os.Stat
otherwise returns without error. So if stat fails for some reason (generally the file does not exist or is not readable), then we'll perform the fallback.
5be8552
to
119abaa
Compare
solver/bboltcachestorage/storage.go
Outdated
return openDB(dbPath) | ||
} | ||
|
||
// fallbackOpenDB performs the panic recovery, database backup, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"performs database backup and opening of the new database file. Called after the first db.Open has failed on reading the file."
119abaa
to
e98763d
Compare
There are some reports that the nosync configuration of the boltdb can cause panics on restarts due to corruption of the database. Mitigate by panic recovery until there is a better solution. Co-authored-by: Tonis Tiigi <tonistiigi@gmail.com> Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
e98763d
to
ccc06b7
Compare
There are some reports that the nosync configuration of the boltdb can
cause panics on restarts due to corruption of the database. Mitigate by
panic recovery until there is a better solution.