Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support read & write with unsynced data in FaultInjectionTestFS #12852

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jul 10, 2024

Summary: Follow-up to #12729 and others to fix FaultInjectionTestFS handling the case where a live WAL is being appended to and synced while also being copied for checkpoint or backup, up to a known flushed (but not necessarily synced) prefix of the file. It was tricky to structure the code in a way that could handle a tricky race with Sync in another thread (see code comments, thanks Changyu) while maintaining good performance and test-ability.

For more context, see the call to FlushWAL() in DBImpl::GetLiveFilesStorageInfo().

Also, the unit test for #12729 was neutered by #12797, and this re-enables the functionality it is testing.

Test Plan: unit test expanded/updated. Local runs of blackbox_crash_test.

The implementation is structured so that a multi-threaded unit test is not needed to cover at least the code lines, as the race handling is folded into "catch up after returning unsynced and then a sync."

Summary: Follow-up to facebook#12729 and others to fix FaultInjectionTestFS
handling the case where a live WAL is being appended to and synced while
also being copied for checkpoint or backup, up to a known flushed (but
not necessarily synced) prefix of the file. A mutex handles concurrency;
we just have to deal with interleavings of legitimate operations.

For more context, see the call to FlushWAL() in
DBImpl::GetLiveFilesStorageInfo().

Also, the unit test for facebook#12729 was neutered by facebook#12797, and this
re-enables the functionality it is testing.

Test Plan: unit test expanded/updated
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

utilities/fault_injection_fs.cc Outdated Show resolved Hide resolved
s = target()->Read(n, options, result, scratch, dbg);
if (!s.ok()) {
return s;
}

assert(result->size() == 0 || target_read_pos_ == read_pos_);
target_read_pos_ += result->size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the file is synced here? It seems we may not read all unsynced data since the buffer will be cleared before we try to read from it. Should we lock mutex_ before target()->Read() above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I've almost completely revamped the approach to make it resilient to this race without killing performance or making the implementation or testing a nightmare.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some minor comment and question. The rest LGTM, thanks for fixing this tricky problem!

void MoveToScratchIfNeeded(Slice* result, char* scratch) {
if (result->data() != scratch) {
// NOTE: might overlap
std::copy_n(result->data(), result->size(), scratch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay to use copy_n if they might overlap? Documentation says it can lead to unpredictable ordering of the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I must have skimmed over fast enough to see its behavior wasn't "undefined" like memcpy, but apparently the ordering of the copying is undefined. I'll switch to std::copy.

utilities/fault_injection_fs.cc Show resolved Hide resolved
@pdillinger
Copy link
Contributor Author

Actually, I'll address both of those things in an immediate follow-up PR

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 72438a6.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 12, 2024
Summary: In follow-up to facebook#12852:
* Use std::copy in place of copy_n for potentially overlapping buffer
* Get rid of troublesome -1 idiom from `pos_at_last_append_` and
  `pos_at_last_sync_`
* Small improvements to FaultInjectionFSTest.ReadUnsyncedData

Test Plan: CI, crash test, etc.
facebook-github-bot pushed a commit that referenced this pull request Jul 15, 2024
Summary:
In follow-up to #12852:
* Use std::copy in place of copy_n for potentially overlapping buffer
* Get rid of troublesome -1 idiom from `pos_at_last_append_` and `pos_at_last_sync_`
* Small improvements to test FaultInjectionFSTest.ReadUnsyncedData

Pull Request resolved: #12861

Test Plan: CI, crash test, etc.

Reviewed By: cbi42

Differential Revision: D59757484

Pulled By: pdillinger

fbshipit-source-id: c6fbdc2e97c959983184925a855cc8b0285fa23f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants