-
Notifications
You must be signed in to change notification settings - Fork 462
fix(Fpath.rm_rf_dir): retry clearing directory after interleaved writes #13371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Antonio Nuno Monteiro <anmonteiro@gmail.com>
3b965c9 to
2bb5d83
Compare
| | exception Unix.Unix_error (ENOENT, _, _) -> | ||
| (* How can we end up here? [clear_dir] cleared the directory successfully, | ||
| and rm_rf_dir = | ||
| let max_rm_dir_attempts = 5 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still fail after 5 times though. How about on failure, we just give up and move the sandbox to some place like _build/.broken-sandboxes that we lazily clean-up on start-up before creating the first sandbox? I think this is more reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also be useful on Windows for which there is a specialised win32_unlink function above. The issue on Windows is that background services like the virus scanner can hog the file permissions and cause a sandbox deletion to fail. Clearing up the sandbox doesn't really need to be so fatal, so I think moving it to a broken sandbox directory would help us avoid the retry code there too.
|
It would be good to know which actions are causing this for you. Such an action still has processes lingering after dune considers it finished. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
|
What's the alternative? Opening the fd after dispatching? Sure, you can do that but I don't see what good it would do. Are you so confident that this is relevant to the issue at all though? |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
It's hard to tell which rule is being built when this happens, but in the limited logging I added, the directory being removed was: |
|
Thanks for investigating. Unfortunately that's a bit too vague as just about any compilation exception will have these assembly artifacts. I am bit a surprised that this directory is outside the sandbox and we're still trying to clean it up though. Going to draft this PR for now since I'd like to see the fix that I mentioned earlier here instead. |
in high-concurrency scenarios,
Fpath.rm_rf_dirmay end up in a scenario where the directory must be cleared again after an interleaved write.