You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reproducible in 4.3 stable and 4.4 at HEAD. Looking at the source history, I believe this bug has existed since at least 2014.
System information
Windows 11 and Ubuntu 24
Issue description
Godot's FileAccess is used to both save resources in the editor and to save game state by game developers. To reduce the risk of files being left in an intermediate state in the event of an error, FileAccess is able to write to a temporary file, then moves that file on top of the existing file. This is the default behavior in the editor and in any game where OS.set_use_file_access_save_and_swap(true) is used. While this is good enough to protect against errors and crashes in Godot itself, it does not provide an atomic operation that protects against power loss or crash of the operating system.
Before renaming the temporary file, it's essential to ensure that the newly written contents have actually been committed to the underlying storage and aren't still sitting in the OS buffers. Otherwise, the effects of the rename operation may be written to disk before the contents of the file. Power loss or OS crash during this state could leave a partially written file in place of the original, with no direct way to recover the original.
On POSIX systems, commit to underlying storage can be accomplished with the fsync() system call. When called with a file descriptor, fsync() will block until all outstanding writes associated with the files descriptor have been acknowledged by the underlying storage device as being stable against power loss. On Windows, the equivalent of fsync() is FlushFileBuffers().
Note that fflush() is distinct from fsync(). The former operates between the process and the operating system, and the later between the operating system and the storage device.
Examples of other libraries and applications properly using fsync() after writing to a temporary file but before renaming it:
I noticed this problem with scrolling on Reddit. One post told the story of how a project was ruined because their computer lost power while saving. Immediately upon reading, it stuck me as a classic case of failing to sync the filesystem when attempting to do atomic writes. Looking at FileAccess, my suspicions were confirmed. With a trivial search, I was able to find another post where the exact same thing happened.
The responses from other users to these posts is generally to admonish the poster to use source control. While using source control is important, they are missing that this issue isn't specific to the editor; it can corrupt game save files as well.
Steps to reproduce
VM Setup
Reproduction requires simulating a system failure. I did this with VMs in VirtualBox and a USB flash drive. Using a thumb drive slows down disk operations compared to my high speed internal SSD and makes it much easier to hit the race condition between writing the file contents and renaming the file. With VirtualBox, it's easy to pass a single USB thumb drive through to the guest operating system.
For Ubuntu, I formatted the drive as ext4.
For Windows, I formatted the thumb drive with NTFS. Additionally, I had to get the Windows guest operating system to treat the thumb drive like an internal hard disk rather than an external device that could be removed at any time, meaning writes should be cached by the operating system. This is done by opening Device Manager in the Windows guest, identifying the correct USB drive under Disk Drives, right clicking it and selecting Properties, going to the policy tab, and changing the Removal Policy from Quick removal to Better performance.
Ubuntu inside of VirtualBox sometimes hanged on boot after the hypervisor reset the guest. Resetting the guest again was effective in getting a good reboot.
I was unable to get 3D acceleration working in the Windows VM guest, so Godot was unable to initialize OpenGL in the MRP. To workaround this, I hacked a simple command line interface into to the MRP than can be used in Godot's headless mode. Simply run godot --headless followed by one or more of the commands listed below.
MRP
The provided reproduction project provides a simple GUI that pseudo-randomly generates two different 100 MB files: file A and file B. Then, either file A or file B to be copied to third file: file C. Finally, file C can be compared against either file A or file B. Files A, B, and C are all placed in the project directory.
Place the MRP on the thumb drive and mount it in the guest.
Launch the project and generate both files A and B. In headless mode, use the gen_a and gen_b commands.
Push the button to copy file A to file C. For headless, use copy_a.
On Ubuntu, run the fsync command. On Windows, wait 30 seconds.
Push the button to copy file B to file C. For headless, use copy_b.
When the interface indicates that that the copy is complete, wait ~4 seconds. The exact time to wait will depend on the system and may take some tuning.
When the designated waiting time has elapsed, immediately have the hypervisor reset the guest. In Virtualbox, this is done by pressing the Host+R key combination. It may be necessary to disable a warning dialog.
After rebooting, run the MRP again.
Compare file C to both file A and file B. If file C should matches either file A or file B. If file C matches neither file A nor file B, then it has been corrupted. For headless, the comparison can be done with cmp_a and cmp_b.
If no corruption is found, repeat the process by copying whichever file doesn't match file C. Go to step 6.
On Ubuntu, I'm able to repeat the corruption in file C in about 1 out of 4 tries. on Ubuntu. On Windows, I can repeat the corruption on almost every try.
Tested versions
Reproducible in 4.3 stable and 4.4 at HEAD. Looking at the source history, I believe this bug has existed since at least 2014.
System information
Windows 11 and Ubuntu 24
Issue description
Godot's
FileAccess
is used to both save resources in the editor and to save game state by game developers. To reduce the risk of files being left in an intermediate state in the event of an error,FileAccess
is able to write to a temporary file, then moves that file on top of the existing file. This is the default behavior in the editor and in any game whereOS.set_use_file_access_save_and_swap(true)
is used. While this is good enough to protect against errors and crashes in Godot itself, it does not provide an atomic operation that protects against power loss or crash of the operating system.Before renaming the temporary file, it's essential to ensure that the newly written contents have actually been committed to the underlying storage and aren't still sitting in the OS buffers. Otherwise, the effects of the rename operation may be written to disk before the contents of the file. Power loss or OS crash during this state could leave a partially written file in place of the original, with no direct way to recover the original.
On POSIX systems, commit to underlying storage can be accomplished with the
fsync()
system call. When called with a file descriptor,fsync()
will block until all outstanding writes associated with the files descriptor have been acknowledged by the underlying storage device as being stable against power loss. On Windows, the equivalent offsync()
isFlushFileBuffers()
.Note that
fflush()
is distinct fromfsync()
. The former operates between the process and the operating system, and the later between the operating system and the storage device.Examples of other libraries and applications properly using fsync() after writing to a temporary file but before renaming it:
I noticed this problem with scrolling on Reddit. One post told the story of how a project was ruined because their computer lost power while saving. Immediately upon reading, it stuck me as a classic case of failing to sync the filesystem when attempting to do atomic writes. Looking at
FileAccess
, my suspicions were confirmed. With a trivial search, I was able to find another post where the exact same thing happened.The responses from other users to these posts is generally to admonish the poster to use source control. While using source control is important, they are missing that this issue isn't specific to the editor; it can corrupt game save files as well.
Steps to reproduce
VM Setup
Reproduction requires simulating a system failure. I did this with VMs in VirtualBox and a USB flash drive. Using a thumb drive slows down disk operations compared to my high speed internal SSD and makes it much easier to hit the race condition between writing the file contents and renaming the file. With VirtualBox, it's easy to pass a single USB thumb drive through to the guest operating system.
For Ubuntu, I formatted the drive as ext4.
For Windows, I formatted the thumb drive with NTFS. Additionally, I had to get the Windows guest operating system to treat the thumb drive like an internal hard disk rather than an external device that could be removed at any time, meaning writes should be cached by the operating system. This is done by opening Device Manager in the Windows guest, identifying the correct USB drive under Disk Drives, right clicking it and selecting Properties, going to the policy tab, and changing the Removal Policy from Quick removal to Better performance.
Ubuntu inside of VirtualBox sometimes hanged on boot after the hypervisor reset the guest. Resetting the guest again was effective in getting a good reboot.
I was unable to get 3D acceleration working in the Windows VM guest, so Godot was unable to initialize OpenGL in the MRP. To workaround this, I hacked a simple command line interface into to the MRP than can be used in Godot's headless mode. Simply run
godot --headless
followed by one or more of the commands listed below.MRP
The provided reproduction project provides a simple GUI that pseudo-randomly generates two different 100 MB files: file A and file B. Then, either file A or file B to be copied to third file: file C. Finally, file C can be compared against either file A or file B. Files A, B, and C are all placed in the project directory.
gen_a
andgen_b
commands.copy_a
.fsync
command. On Windows, wait 30 seconds.copy_b
.cmp_a
andcmp_b
.On Ubuntu, I'm able to repeat the corruption in file C in about 1 out of 4 tries. on Ubuntu. On Windows, I can repeat the corruption on almost every try.
Minimal reproduction project (MRP)
godot-nosync-repro.zip
The text was updated successfully, but these errors were encountered: