Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursively delete directories on Windows without race conditions #110

Open
Rufflewind opened this issue Jul 19, 2020 · 11 comments
Open

Recursively delete directories on Windows without race conditions #110

Rufflewind opened this issue Jul 19, 2020 · 11 comments
Labels
type: a-bug The described behavior is not working as intended.

Comments

@Rufflewind
Copy link
Member

Rufflewind commented Jul 19, 2020

Niall Douglas' "Racing the File System" at CppCon actually outlines a deletion algorithm that works correctly on Windows. It does not require retries.

The gist of it:

  • Deletion is effectively asynchronous on Windows.
  • If deleting recursively, you want to rename the item you're deleting out of the directory you're deleting first, then delete it.

This provides an alternative solution to the issue described in #96 and #108.

Alternative

It seems there's yet another way to do this using Windows 10 V1607's POSIX-compatible API:

SetFileInformationByHandle(FILE_DISPOSITION_INFO_EX { FILE_DISPOSITION_DELETE | FILE_DISPOSITION_POSIX_SEMANTICS })

But this is limited to NTFS.

https://stackoverflow.com/a/50234974

@LaurentRDC
Copy link
Contributor

This is described on slide 11 of the talk.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 20, 2020

That makes very little sense. Slide 19 indicates that you can't rename the file if any process has a read lock. Which is exactly the problem.

The rename method only allows you to prevent new processes to acquire read locks on the file if they are trying to open the file by name since renames are atomic vs delete files which is just marking for deletion.

It explicitly says you still have to retry. It also doesn't take into account services that lock any new file like an AV. After you rename that file can still get locked. So I don't understand how that method "solves" it.

@Rufflewind
Copy link
Member Author

Sorry, I think I may have conflated two issues:

  • Sometimes another program like the indexer or AV might keep a file / directory locked for a brief amount of time, preventing deletion. This approach does not address this problem.

  • Deletion is asynchronous, so trying to delete a parent directory after "deleting" its children may fail because the children are still pending deletion. This issue is intended to address this problem.

@hasufell
Copy link
Member

I'd say this algorithm makes it less atomic and makes things more complicated in case of partial failure and exceptions. You'll end up with stray directories, no? Retry is a viable solution for both problems afais.

@LaurentRDC
Copy link
Contributor

LaurentRDC commented Jul 20, 2020

Retry is a viable solution for both problems afais.

The inclusion of retrying functions in base is still contentious (see here).

@Mistuke
Copy link
Contributor

Mistuke commented Jul 21, 2020

I personally don't understand why it's being added to base and not a uncoupled library like directory. Adding it to directory makes more sense since it can then be used with any version of the compiler.

Adding it to base restricts it to the bleeding edge for no reason. If this is really so contentious I'll just add it to Win32 and move on. But i don't want to split functionality across libraries unless needing to.

I think we're over engineering this and over thinking a simple solution. But that's just my two cents.

@LaurentRDC
Copy link
Contributor

If this is really so contentious I'll just add it to Win32 and move on.

It appears that this would be the least disruptive solution. This is a Windows-specific problem, and as discussed in the GHC tracker, retrying is kind of problem-specific.

@Rufflewind
Copy link
Member Author

Rufflewind commented Jul 26, 2020

You'll end up with stray directories, no?

It looks like you can use FILE_FLAG_DELETE_ON_CLOSE to avoid stray items. The Rust implementation below seems like a good reference:

https://docs.rs/crate/remove_dir_all/0.5.3/source/src/fs.rs


On another note, is there any reason SHFileOperation({wFunc: FO_DELETE}) can't be used? That would lead to the simplest solution, avoiding a lot of the complexity.

The only counterargument I've seen is that the API does not work on handles rust-lang/rust#29497 (comment), which does not apply to directory.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 26, 2020

You'll end up with stray directories, no?

It looks like you can use FILE_FLAG_DELETE_ON_CLOSE to avoid stray items. The Rust implementation below seems like a good reference:

https://docs.rs/crate/remove_dir_all/0.5.3/source/src/fs.rs

FILE_FLAG_DELETE_ON_CLOSE introduces a race condition because unlike Unix the file entry remains but is marked unusable vs the inode the only thing remaining. Your DeleteFile operation will complete successfully but the filename is unusable until the last handle is closed.

This means if the programmer tries so open a new file with the same name/path they will get an access denied on CreateFile. This quite literally just moves the problem.

On another note, is there any reason SHFileOperation({wFunc: FO_DELETE}) can't be used? That would lead to the simplest solution, avoiding a lot of the complexity.

The only counterargument I've seen is that the API does not work on handles rust-lang/rust#29497 (comment), which does not apply to directory.

It is not entirely clear to me that these shell API functions support NT namespace paths. https://docs.microsoft.com/en-us/windows/win32/api/shellapi/ns-shellapi-shfileopstructw suspiciously mentions the buffer must be large enough to hold MAX_PATH. If they don't then it's a major regression and won't work for Haskell programs.

It is also not entirely clear that they actually solve the problem. The function has a documented return code for when a sharing violation occurred during the operation. In fact the recursive delete that it provides can fail for some files so it can return partial failures. Now you just won't know which files failed.

@Rufflewind
Copy link
Member Author

To clarify, the scope of the current issue is:

  1. Make sure that recursively deletion does not have internal race conditions -- it should not "trip over itself" as it ascends the directory hierarchy.

  2. Allow deletion to proceed even if files are open by other processes as long as they enable FILE_SHARE_DELETE.

To maintain current behavior, support for long paths would be a requirement.

On the other hand, the following are out of scope of this issue:

  • The asynchronicity of deletion on Windows: it is not possible to offer a truly synchronous deletion API that guarantees termination except in limited circumstances, e.g. NTFS on Win 10 V1607.

    That being said, on can reduce the impact on subsequent operations that expect the directory to be gone by renaming trick. Cygwin's implementation goes a step further by using the Recycle Bin for this purpose.

  • Lack of knowledge about the state of the target tree after a partially failed deletion: The current API does not provide any information on that and there are no plans to change that.

@Rufflewind
Copy link
Member Author

https://bugs.python.org/issue40143#msg365697 noted that DeleteFileW on Windows 10 RS1 uses FILE_DISPOSITION_POSIX_SEMANTICS by default on NTFS systems. RemoveDirectoryW might still be affected however.

Given that, I'm curious whether Microsoft plans to (eventually) fix the issue on their end though. If that's their plan it may be better to stick to the conventional Win32 APIs and wait for them to fix it than to create our own unique workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: a-bug The described behavior is not working as intended.
Projects
None yet
Development

No branches or pull requests

4 participants