Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in multiops txn stress test #12847

Closed
wants to merge 3 commits into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Jul 8, 2024

Summary: MultiOpsTxnsStressListener::OnCompactionCompleted() access db_ and can be called while db_ is being destroyed in ~StressTest(). This causes TSAN to complain about data race. This PR fixes this issue by calling db_->Close() first to stop all background work. Also moved the cleanup out of StressTest destructor to avoid race between the listener and ~StressTest().

Test plan: monitor crash test failure.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from jowlyzhang July 8, 2024 22:50
for (auto cf : column_families_) {
delete cf;
}
column_families_.clear();
if (db_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiOpsTxnsStressListener::OnCompactionCompleted() access db_ ...

Will this happen when we are trying to if (db_) {}.. here in the CleanUp()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can happen, I think during Close() we will wait for compactions to finish and stop further compactions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that the listener only reads the db pointer to check the DB has not been closed yet

  void OnCompactionCompleted(DB* db, const CompactionJobInfo& info) override {
    assert(db);
#ifdef NDEBUG
    (void)db;
#endif
    assert(info.cf_id == 0);
    const ReadOptions read_options(Env::IOActivity::kCompaction);
    stress_test_->VerifyPkSkFast(read_options, info.job_id);
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more usage of db_ inside VerifyPkSkFast() below.

bool run_stress_test = RunStressTest(shared.get());
// Close DB in CleanUp() before destructor to prevent race between destructor
// and operations in listener callbacks (e.g. MultiOpsTxnsStressListener).
stress->CleanUp();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean DB's Close() called in DB's destructor will still race with access of db_ pointer even though the destructor hasn't existed and db_pointer is not nullptr yet at that point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call Close() explicitly, DB destructor should not call DB Close() again.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@cbi42
Copy link
Member Author

cbi42 commented Jul 9, 2024

Thanks for the review.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in d6f265f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants