-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync filesystem before rename during atomic write #98361
base: master
Are you sure you want to change the base?
Conversation
Make sure the contents of the file are written to disk before renaming it on top of any existing file. In the event of power loss or OS crash, this ensures that the file will contain either the old version or the complete contents of the new version. Otherwise, the user could be left with an incompletely written file and data loss.
Hi! Overall looks fine but there's a couple comments I have: fileno can failCurrently: int fd = fileno(f);
fsync(fd); I suggest to change it to: int fd = fileno(f);
if( fd == -1 ) {
// Error. Set badbit
}
else {
int status = fsync(fd);
if( status != 0 ) {
// Error. Set badbit
}
} By "badbit" it means tagging the structure as bad, since something (likely unrecoverable) went wrong with file handling. Handling AppleOn Apple you need F_FULLFSYNC. #ifdef __APPLE__
int status = fcntl( fd, F_FULLFSYNC, NULL );
if( status )
{
// If we are not on a file system that supports this,
// then fall back to a plain fsync.
status = fsync( fd );
if( status != 0 ) {
// Error. Set badbit;
}
}
#endif See this discussion. On non-Apple, consider fdatasync insteadInstead of calling fsync, call fdatasync. Update: sqlite checks at build time whether fdatasync is reliable or not, and choses between fdatasync and fsync. No benchmark data provided, no regression testsOccasionally fsync/fdatasync can cause global system slowdown. At process-level indiscriminately applying These operations should be used sparingly for specific file data where preserving its contents is of utmost importance. This usually means config files and user data. Editor's lost files that can be restored because they're already on version control should not use this feature. Files that go into the cache should not use this feature. The problem it's solving is a big one (without proper fsync, power failures can cause both the backup and the new data to be lost*), but there needs to be some analysis on the impact as it can potentially put loading times back to 1990. * It is a bit concerning the Reddit post you linked to says all open scn files were destroyed as a result of the power failure. That suggest Godot is aggressively overwriting files even without any changes. Ideally, all IO files being written would defer fsync-then-OS-rename to the end of all IO processing, to maximize throughput. But that would likely require a higher level redesign. |
The only way Actually, what I should do is retry
It looks like Apple now recommends
I don't think this will make any noticeable performance difference.
I would rather force the mtime to be synced so that a user inspecting their save files after a power loss sees the correct timestamp for when their game is saved.
Unfortunately, Godot's
I disagree with this statement. The editor doesn't really have real-time performance expectations when it comes to operations like saving files. An application shouldn't be relying on an external tool for basic corruption protection of user data. The fact that I've found two cases of people losing large parts of their projects due to this is evidence that we shouldn't simply rely on version control. Many Godot users are artists, not computer scientists. They may be struggling with Git, and therefore using it less frequently than you or I. Also, even if we told users to make sure they updated version control after each change, there would be no performance gain as Git is calling the exact same The real problem with the editor is the serialization of file operations. A files is completely written and closed before moving on to the next file. A better way to handle this in the editor would be to make
This would mean that we would only wait once for physical storage to catch up. Again, this would be a proposal beyond this scope of this change. I think this what you are suggesting at the end of your comment.
Just to be clear, this isn't going to affect load times. My subjective ad-hoc testing shows minimal impact when dealing with a few large files on a USB thumb drive and no noticeable impact when dealing with a high performance internal SSD. As games are generally writing only one or a handful of files during each save operation, I don't expect this change to be noticeable in the vast majority of games. Also, only games that have explicitly set The real opportunity for noticeable performance regression is when large number of files are written sequentially. The editor may do this, and the editor does set I'll see if I can find some time to set up an example project with 1,000 small/medium files and measure the time difference in saving the project. |
Agreed 😄
OK I'll elaborate. I've been bitten by the fsync monster before. Let's say Godot is calling fsync 240 times per second for 10 seconds. This would annihilate overall system performance because of forcing SSDs to flush aggressively. It also severely wears down SSD cells. We are talking about 10x to 1000x level of slowdown. Specially on high performance SSDs, because the high performant one (i.e. server-grade) are written to fully honor flushes. It'd be the equivalent of setting dirty_writeback_centisecs to 0.41 (the Linux kernel only accepts integer values). Basically what I'm saying is some extra guards are needed. Off the top of my head (I'm talking out loud):
Basically we need to understand what's going on and mark potential sources of performance problems.
I don't really care if saving 1000 small files takes 100x longer, because it is done so safely (but of course, being able to improve the time it takes with higher level refactors is always welcome). I'm more worried about Godot using sync() in unnecessary moments, causing overall slowdown at all times or during project loads. Which brings me to:
I'm just saying we need to understand the problem more and understand what's going. But I am heavily worried that ~12 open scenes were all lost after a power failure. Having 1 to 3 files corrupted would be normal. 12? Something's up and needs more research. Gathering data is key. |
Make sure the contents of the file are written to disk before renaming it on top of any existing file. In the event of power loss or OS crash, this ensures that the file will contain either the old version or the complete contents of the new version. Otherwise, the user could be left with an incompletely written file and data loss.
In each of the Windows and Unix implementations of
FileAcccess
, add an internalsync()
operations that's called right before the rename ifis_backup_save_enabled()
. On Unix,sync()
callsfsync()
on the underlying file descriptor. On Windows,sync()
callsFlushFileBuffers()
on the underlying file handle.This fixes #98360.