Support read & write with unsynced data in FaultInjectionTestFS #12852

pdillinger · 2024-07-10T17:22:44Z

Summary: Follow-up to #12729 and others to fix FaultInjectionTestFS handling the case where a live WAL is being appended to and synced while also being copied for checkpoint or backup, up to a known flushed (but not necessarily synced) prefix of the file. It was tricky to structure the code in a way that could handle a tricky race with Sync in another thread (see code comments, thanks Changyu) while maintaining good performance and test-ability.

For more context, see the call to FlushWAL() in DBImpl::GetLiveFilesStorageInfo().

Also, the unit test for #12729 was neutered by #12797, and this re-enables the functionality it is testing.

Test Plan: unit test expanded/updated. Local runs of blackbox_crash_test.

The implementation is structured so that a multi-threaded unit test is not needed to cover at least the code lines, as the race handling is folded into "catch up after returning unsynced and then a sync."

Summary: Follow-up to facebook#12729 and others to fix FaultInjectionTestFS handling the case where a live WAL is being appended to and synced while also being copied for checkpoint or backup, up to a known flushed (but not necessarily synced) prefix of the file. A mutex handles concurrency; we just have to deal with interleavings of legitimate operations. For more context, see the call to FlushWAL() in DBImpl::GetLiveFilesStorageInfo(). Also, the unit test for facebook#12729 was neutered by facebook#12797, and this re-enables the functionality it is testing. Test Plan: unit test expanded/updated

facebook-github-bot · 2024-07-10T17:46:07Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

utilities/fault_injection_fs.cc

cbi42 · 2024-07-10T23:04:53Z

utilities/fault_injection_fs.cc

  s = target()->Read(n, options, result, scratch, dbg);
  if (!s.ok()) {
    return s;
  }

+  assert(result->size() == 0 || target_read_pos_ == read_pos_);
+  target_read_pos_ += result->size();


What if the file is synced here? It seems we may not read all unsynced data since the buffer will be cleared before we try to read from it. Should we lock mutex_ before target()->Read() above?

Thanks for catching this! I've almost completely revamped the approach to make it resilient to this race without killing performance or making the implementation or testing a nightmare.

…ad_and_write_unsync

facebook-github-bot · 2024-07-11T22:22:33Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-07-11T22:35:44Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-07-11T22:36:09Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-11T22:44:00Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-07-11T22:44:26Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ad_and_write_unsync

facebook-github-bot · 2024-07-12T01:01:18Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-07-12T01:01:56Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cbi42

Have some minor comment and question. The rest LGTM, thanks for fixing this tricky problem!

cbi42 · 2024-07-12T20:32:06Z

utilities/fault_injection_fs.cc

+void MoveToScratchIfNeeded(Slice* result, char* scratch) {
+  if (result->data() != scratch) {
+    // NOTE: might overlap
+    std::copy_n(result->data(), result->size(), scratch);


Why is it okay to use copy_n if they might overlap? Documentation says it can lead to unpredictable ordering of the results.

Thanks. I must have skimmed over fast enough to see its behavior wasn't "undefined" like memcpy, but apparently the ordering of the copying is undefined. I'll switch to std::copy.

utilities/fault_injection_fs.cc

pdillinger · 2024-07-12T22:48:56Z

Actually, I'll address both of those things in an immediate follow-up PR

facebook-github-bot · 2024-07-12T23:06:15Z

@pdillinger merged this pull request in 72438a6.

Summary: In follow-up to facebook#12852: * Use std::copy in place of copy_n for potentially overlapping buffer * Get rid of troublesome -1 idiom from `pos_at_last_append_` and `pos_at_last_sync_` * Small improvements to FaultInjectionFSTest.ReadUnsyncedData Test Plan: CI, crash test, etc.

Summary: In follow-up to #12852: * Use std::copy in place of copy_n for potentially overlapping buffer * Get rid of troublesome -1 idiom from `pos_at_last_append_` and `pos_at_last_sync_` * Small improvements to test FaultInjectionFSTest.ReadUnsyncedData Pull Request resolved: #12861 Test Plan: CI, crash test, etc. Reviewed By: cbi42 Differential Revision: D59757484 Pulled By: pdillinger fbshipit-source-id: c6fbdc2e97c959983184925a855cc8b0285fa23f

pdillinger requested review from hx235 and cbi42 July 10, 2024 17:22

facebook-github-bot added the CLA Signed label Jul 10, 2024

Fix clang warning-as-error

a0e3fdb

cbi42 reviewed Jul 10, 2024

View reviewed changes

pdillinger added 2 commits July 11, 2024 15:20

Revamp

a36d4f6

Merge remote-tracking branch 'origin/main' into fault_injection_fs_re…

5ff986c

…ad_and_write_unsync

Fix ssize_t/int64_t

bd65f72

Missing file

6838213

pdillinger added 2 commits July 11, 2024 18:00

Fix st.pos_at_last_sync_==-1 cases

d6389d7

Merge remote-tracking branch 'origin/main' into fault_injection_fs_re…

3691312

…ad_and_write_unsync

cbi42 approved these changes Jul 12, 2024

View reviewed changes

facebook-github-bot closed this in 72438a6 Jul 12, 2024

facebook-github-bot added the Merged label Jul 12, 2024

pdillinger mentioned this pull request Jul 12, 2024

FaultInjectionTestFS follow-up and clean-up #12861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support read & write with unsynced data in FaultInjectionTestFS #12852

Support read & write with unsynced data in FaultInjectionTestFS #12852

pdillinger commented Jul 10, 2024 •

edited

Loading

facebook-github-bot commented Jul 10, 2024

cbi42 Jul 10, 2024

pdillinger Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

cbi42 left a comment

cbi42 Jul 12, 2024

pdillinger Jul 12, 2024

pdillinger commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

Support read & write with unsynced data in FaultInjectionTestFS #12852

Support read & write with unsynced data in FaultInjectionTestFS #12852

Conversation

pdillinger commented Jul 10, 2024 • edited Loading

facebook-github-bot commented Jul 10, 2024

cbi42 Jul 10, 2024

Choose a reason for hiding this comment

pdillinger Jul 11, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 11, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

cbi42 left a comment

Choose a reason for hiding this comment

cbi42 Jul 12, 2024

Choose a reason for hiding this comment

pdillinger Jul 12, 2024

Choose a reason for hiding this comment

pdillinger commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

pdillinger commented Jul 10, 2024 •

edited

Loading