Skip to content

Conversation

@triplef
Copy link
Member

@triplef triplef commented Dec 17, 2025

We have some users where writing the NSUserDefault settings fails on Windows with ERROR_NOT_SAME_DEVICE when the settings plist gets renamed from the auxiliary file to the final filename. This can apparently happen due to Windows packaged app filesystem redirection/virtualization in some cases redirecting files via reparse points or even the same path string resolving to different backing locations depending on operation (e.g. create vs open existing).

The change adds a fallback by copying the file instead of renaming, which works across volumes.

@triplef triplef requested a review from rfm as a code owner December 17, 2025 10:30
@rfm
Copy link
Contributor

rfm commented Dec 17, 2025

Copying is not atomic, so it seems to me that the proposed change is breaking the method rather than providing a fallback.
Atomic operations are used to avoid another process or thread reading incomplete content.
Maybe a fallback to using the ReplaceFile() function would do the job? Or maybe windows has some other mechanism to create a mandatory lock on a file such that other processes can't read from it while we are writing to it?

If it turns out to be impossible to implement that method completely reliably on windows, I think failing is the correct behavior.

In the specific case of NSUserDefaults, we probably already hold a cooperative filesystem level lock to prevent other processes accessing the defaults at the same time. If that's working reliably we don't really need an atomic lock there, and the NSZUserDefaults code could fall back to doing a non-atomic write if the atomic write fails.

@rfm
Copy link
Contributor

rfm commented Dec 17, 2025

I suppose the other (possibly better) option would be to check what device the destination file is on, and make sure to create the auxilliary file is on that device, so this particular failure could not occur.

@triplef
Copy link
Member Author

triplef commented Dec 17, 2025

We are actually seeing this error 17 (ERROR_NOT_SAME_DEVICE) for paths that both have the same drive letter, but somehow through filesystem redirection/virtualization they seem to end up on different file systems. Here is the full error we are seeing:

Failed to write defaults database to file: C:/Users/<user>/AppData/Local/<company>/<app>/Preferences/com.example.app.plist	
File NSData.m: 2101. In -[NSData writeToFile:options:error:] Rename ('C:/Users/<user>/AppData/Local/<user>/<app>/Preferences/com.example.app.plista28340' to 'C:/Users/<user>/AppData/Local/<user>/<app>/Preferences/com.example.app.plist') failed - Error Domain=NSOSStatusErrorDomain Code=17 "The system cannot move the file to a different disk drive. 	
"

You are of course right that the copy fallback is not atomic, but it’s unclear to me how to fix this in a better way. The auxiliary file is already being created on the same drive according to the path. I guess there might be a more involved fix where we would use additional Windows APIs to figure out the actual destination path after redirection, but that seems difficult to get right and test. And if we can’t guarantee that the files are actually on the same volume, as far as I know there is no Windows API for an atomic replace across volumes.

Happy to try other approaches if you have more ideas how to improve this.

@rfm
Copy link
Contributor

rfm commented Dec 17, 2025

I don't work on windows, but from what I can find searching online, your best approach might be to use LockFileEx() for atomic writes. So instead of using an auxilliary file, you open the original file, lock it, overwrite it, and release the lock.
Assuming LockFileEx() is reliable, that gets you the atomicity. The downside is that it obviously blocks other processes for as long as it takes to write to the file (usually a negligible performance impact, but likely to be an issue in some cases), so it would be best done as a fallback rather than a simple replacement for the current code.

@triplef triplef force-pushed the win-nsdata-error-not-same-device branch from ba50501 to 9976fe0 Compare December 18, 2025 11:32
@triplef
Copy link
Member Author

triplef commented Dec 18, 2025

If I understand it correctly locking wouldn’t protect against partial writes e.g. if the app crashes, and only works if readers lock too.

I’ve implemented another approach based on your other suggestion:

I suppose the other (possibly better) option would be to check what device the destination file is on, and make sure to create the auxilliary file is on that device, so this particular failure could not occur.

I still kept the copy fallback, but with this change in theory it shouldn’t happen any more. How does that look to you?

@rfm
Copy link
Contributor

rfm commented Dec 18, 2025

I think that looks good except ... I'm still not really happy about an explicitly atomic API doing a non-atomic copy.
How about implementing the fallback of copying elsewhere. So if your solution doesn't work, the write fails, but the higher level code (NSUserDefaults in your case) can then do a non-atomic write and use NSLog() to say what it is doing and what might be broken (another process could fail to read defaults correctly, or a partial write could leave the file corrupt). Then, at least users would know there is an issue and have a reasonable chance to fix it.

@rfm
Copy link
Contributor

rfm commented Dec 19, 2025

I can kind of see why you might want to put the fallback code in -writeToFile:atomically: rather than having higher level code deal with the issue (it's no great trouble to get NSUserDefaults to do it, but in practice most code expects methods to work and doesn't bother checking return values).

I'm not completely set against that, but I feel it would need a lot of work as follows:
0. try to write atomically and on failure fallback to:

  1. log, as fully as possible, the reason for the failure
  2. log/alert the user telling them that a copy is about to be done and that the non-atomic nature of this means that other process may read incomplete data, and if a crash occurs the file may be left incomplete
  3. attempt the copy and then deletion of the auxilliary file
  4. log/alert the user to tell them if the copy worked (and if the auxilliary file was deleted)
  5. return the result of the copy as the result of the atomic write.

@triplef
Copy link
Member Author

triplef commented Dec 19, 2025

Thanks for your additional thoughts. It makes sense what you say about an explicitly atomic API doing a non-atomic copy.

Since the original issue should hopefully be fixed even without the fallback, I’ll just remove it and we keep this change to just resolving the destination directory.

@triplef triplef force-pushed the win-nsdata-error-not-same-device branch from 9976fe0 to 8b58787 Compare December 19, 2025 13:29
@triplef triplef force-pushed the win-nsdata-error-not-same-device branch from 8b58787 to 450ba9c Compare December 19, 2025 13:33
@rfm rfm merged commit 2b236ec into master Dec 21, 2025
17 of 18 checks passed
@triplef triplef deleted the win-nsdata-error-not-same-device branch December 22, 2025 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants