-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fcntl(F_FULLFSYNC) on OS X #9356
Conversation
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do we know the performance impact of this change? Maybe worth to mention that in the HISTORY.md.
Thanks @jay-zhuang for the review. |
@jay-zhuang updated PR description to mention potential performance impact. |
ff6465f
to
3819e15
Compare
@riversand963 has updated the pull request. You must reimport the pull request before landing. |
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
fsync/fdatasync on Linux: ``` (fsync/fdatasync) includes writing through or flushing a disk cache if present. ``` However, on OS X and iOS: ``` (fsync) will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence. ``` Solution is to use `fcntl(F_FULLFSYNC)` on OS X so that we get the same persistence guarantee. This is similar to bug reports/fixes in: - golan: golang/go#26650 - leveldb: google/leveldb@296de8d. Not sure if we should fallback to fsync since we break persistence contract.
3819e15
to
69bcbf9
Compare
@riversand963 has updated the pull request. You must reimport the pull request before landing. |
69bcbf9
to
19ea243
Compare
@riversand963 has updated the pull request. You must reimport the pull request before landing. |
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This change has a huge performance impact. Running tests on MacOs now takes significantly more time. With this change: Prior to this change: [----------] Global test environment tear-down This is just one example test of many that are significantly slower. |
We can disable fsync in tests: Lines 916 to 919 in 875bfd7
A non-comprehensive search found it set to true in 16/69 places, but it seems unlikely there's really 53+ tests that need to fsync. |
To be fair, the above unit test (DBIteratorTestInstance/DBIteratorTest.IterWithSnapshot/*) amplifies the performance overhead caused by F_FULLFSYNC by calling fcntl(F_FULLFSYNC) 1684 times in total.
|
Is there anything that can be done to make this better? A make -j check is taking an incredibly long time on the Mac now. Many of the tests take an extra 5-10 seconds per test case. Testing was already slower on the Mac than Linux but this has made it unbearably worse. To run the table_test suite on my machine takes 15x as long as it did prior to this change. Other tests (as mentioned earlier) also are much slower. I have spent no time trying to determine why the code is doing the sync, but can see that this change is what is causing the issues. |
I can think of a few things.
Should 2 be considered, I would also suggest that #9356 (comment) be investigated. |
The CircleCI tests are also failing sporadically on the Mac with "deadline exceeded". I submitted a PR that is not used for most of the tests (only a new test) and it still fails, so the CI environment cannot be relied upon. I was going to suggest a compile time variable such as ROCKSDB_DISABLE_FULL_FSYNC, similar to what is done with ROCKSDB_DISABLE_SYNC_IN_RANGE and ROCKSDB_DISABLE_FALLOCATE. Is there a reason not to implement such a solution? If I have time, I will attempt to figure out how many calls (and from where) to fsync we are making for these tests, and how long each type of call is taking. At the moment, this change is making testing on the Mac very painful and is killing performance. While that might be necessary for data integrity, it should be something that the user opts into, not that is always on. |
This has been happening even before this commit. At present, there has not been history long enough to demonstrate this commit is a problem for CI. We will keep an eye on this.
So far, there has been no evidence of this commit hurting the overall performance of applications that use rocksdb library. I would be surprised if so because we keep |
The TableTest/ParameterizedHarnessTest.RandomizedHarnessTest suite takes 15x what it used to on my Mac. It does not call sync/fsync at all in the test code and uses the default Env. One test on my machine that now takes 14000+ ms performed 305 syncs + 305 fsyncs. There are no calls to sync in the test itself -- it is writing data and doing compaction. I agree that this test probably does not need to do fsync but it would need to be changed to use a "NoSyncFileSystem". Additionally, we would need to track down all of the other tests that are much slower because of this and change them as well. And if anyone is in production on a Mac, I would guess they will not be pleasantly surprised at how long these operations now take. |
This PR does not change how many
This is a parameterized tests. For each test case, it will create new db at least once involving syncing dir and file sync. Check FWIW, there are several things to consider
|
You can specify |
Not all tests (table_test for example) go through SpecialEnv so using that "flag" will not work. Furthermore, table_test does not do anything special -- build a table, compaction, gets and seeks. It does not ever explicitly call sync. Each of the table_test ParameterizedHarnessTest that use a DB used to take less than 1 second and now take well over 4+ seconds. Some investigation shows that the test with args "reverse=0, restart=16 compression=kBZip2Compression threads=4, format=2 mmap=0" is doing: Each of these "sync" operations appears to be taking around an additional 23 ms (I have disabled them all individually and it does not appear any one class of operation is taking longer than another). Perhaps the 23 ms per operation is a reasonable "tax" to pay. But the cumulative total for this test is over 14+ seconds. I will code up a FileSystem that can be taught to skip these various operations that we can use for testing. But perhaps someone should investigate why we are doing so many of them and if it is really necessary? These are fairly simple tests of a small database. What will happen in a "real world" scenario? |
Can you try the following? It can easily be wrapped in a script to support running I actually tried the following, the test execution time went down to 729ms for OS X.
|
With the approach of using a Ramdisk on Mac, this adds little value while introducing unnecessary code.
This test simple, but it still requires
I doubt whether there is real world scenario repeatedly calling |
Closing #5954
fsync/fdatasync on Linux:
However, on OS X and iOS:
Solution is to use
fcntl(F_FULLFSYNC)
on OS X so that we get the samepersistence guarantee.
According to OSX man page,
This suggests that it will be no faster than
fsync
on Linux, since Linux, according to its man page,It means Linux may not flush all data from disk cache.
This is similar to bug reports/fixes in:
Not sure if we should fallback to fsync since we break persistence contract.