-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand auto recovery to background read errors #9679
Conversation
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anand1976 for the PR. Left a few comments.
@@ -316,24 +309,18 @@ Status DBImpl::FlushMemTableToOutputFile( | |||
// error), all the Manifest write will be map to soft error. | |||
// TODO: kManifestWriteNoWAL and kFlushNoWAL are misleading. Refactor is | |||
// needed. | |||
error_handler_.SetBGError(io_s, | |||
error_handler_.SetBGError(s, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe s
can be non-ok and log_io_s
is ok, while WAL has actually happened. Therefore, the ErrorReason here is incorrect? Or is the name kManifestWriteNoWAL
confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks so according to the comment in line 310, and we can address this in future PRs. We can put some comments to explain the logic error handling based on reason, status, etc. in an easy-to-find place, if feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This could happen WAL or no WAL. Probably need a new reason, or maybe a structure with more context about the error.
18fc94e
to
1c74562
Compare
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1c74562
to
13db079
Compare
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fix and enhance the background error recovery logic to handle the following situations - 1. Background read errors during flush/compaction (previously was resulting in unrecoverable state) 2. Fix auto recovery failure on read/write errors during atomic flush. It was failing due to a bug in setting the resuming_from_bg_err variable in AtomicFlushMemTablesToOutputFiles.
13db079
to
16d281a
Compare
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Fix and enhance the background error recovery logic to handle the
following situations -
resulting in unrecoverable state)
It was failing due to a bug in setting the resuming_from_bg_err variable
in AtomicFlushMemTablesToOutputFiles.
Test plan:
Add new unit tests in error_handler_fs_test