-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race in multiops txn stress test #12847
Conversation
2af37ce
to
52f5203
Compare
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
for (auto cf : column_families_) { | ||
delete cf; | ||
} | ||
column_families_.clear(); | ||
if (db_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiOpsTxnsStressListener::OnCompactionCompleted() access db_ ...
Will this happen when we are trying to if (db_) {}..
here in the CleanUp()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can happen, I think during Close() we will wait for compactions to finish and stop further compactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that the listener only reads the db pointer to check the DB has not been closed yet
void OnCompactionCompleted(DB* db, const CompactionJobInfo& info) override {
assert(db);
#ifdef NDEBUG
(void)db;
#endif
assert(info.cf_id == 0);
const ReadOptions read_options(Env::IOActivity::kCompaction);
stress_test_->VerifyPkSkFast(read_options, info.job_id);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more usage of db_ inside VerifyPkSkFast() below.
bool run_stress_test = RunStressTest(shared.get()); | ||
// Close DB in CleanUp() before destructor to prevent race between destructor | ||
// and operations in listener callbacks (e.g. MultiOpsTxnsStressListener). | ||
stress->CleanUp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean DB's Close() called in DB's destructor will still race with access of db_ pointer even though the destructor hasn't existed and db_pointer is not nullptr yet at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we call Close() explicitly, DB destructor should not call DB Close() again.
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
Thanks for the review. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
MultiOpsTxnsStressListener::OnCompactionCompleted()
accessdb_
and can be called while db_ is being destroyed in ~StressTest(). This causes TSAN to complain about data race. This PR fixes this issue by calling db_->Close() first to stop all background work. Also moved the cleanup out of StressTest destructor to avoid race between the listener and ~StressTest().Test plan: monitor crash test failure.