Skip to content

Commit

Permalink
Fix race condition in SstFileManagerImpl error recovery code (#9435)
Browse files Browse the repository at this point in the history
Summary:
There is a race in SstFileManagerImpl between the ClearError() function
and CancelErrorRecovery(). The race can cause ClearError() to deref the
file system pointer after it has been freed. This is likely to occur
during process shutdown, when the order of destruction of the
DB/Env/FileSystem and SstFileManagerImpl is not deterministic.

Pull Request resolved: facebook/rocksdb#9435

Test Plan:
Reproduce the crash in a TSAN build by introducing sleeps in the code, and verify with
the fix.

Reviewed By: siying

Differential Revision: D33774696

Pulled By: anand1976

fbshipit-source-id: 643d3da31b8d2ee6d9b6db5d33327e0053ce3b83
  • Loading branch information
anand76 authored and facebook-github-bot committed Jan 26, 2022
1 parent 8822562 commit beb86ad
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Note: The next release will be major release 7.0. See https://github.com/faceboo
* Fix a bug that FlushMemTable may return ok even flush not succeed.
* Fixed a bug of Sync() and Fsync() not using `fcntl(F_FULLFSYNC)` on OS X and iOS.
* Fixed a significant performance regression in version 6.26 when a prefix extractor is used on the read path (Seek, Get, MultiGet). (Excessive time was spent in SliceTransform::AsString().)
* Fixed a race condition in SstFileManagerImpl error recovery code that can cause a crash during process shutdown.

### New Features
* Added RocksJava support for MacOS universal binary (ARM+x86)
Expand Down
5 changes: 3 additions & 2 deletions file/sst_file_manager_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ void SstFileManagerImpl::ClearError() {
while (true) {
MutexLock l(&mu_);

if (closing_) {
if (error_handler_list_.empty() || closing_) {
return;
}

Expand Down Expand Up @@ -297,7 +297,8 @@ void SstFileManagerImpl::ClearError() {

// Someone could have called CancelErrorRecovery() and the list could have
// become empty, so check again here
if (s.ok() && !error_handler_list_.empty()) {
if (s.ok()) {
assert(!error_handler_list_.empty());
auto error_handler = error_handler_list_.front();
// Since we will release the mutex, set cur_instance_ to signal to the
// shutdown thread, if it calls // CancelErrorRecovery() the meantime,
Expand Down

0 comments on commit beb86ad

Please sign in to comment.