Skip to content

Conversation

@anmonteiro
Copy link
Collaborator

in high-concurrency scenarios, Fpath.rm_rf_dir may end up in a scenario where the directory must be cleared again after an interleaved write.

@anmonteiro anmonteiro requested a review from rgrinberg January 19, 2026 00:42
Signed-off-by: Antonio Nuno Monteiro <anmonteiro@gmail.com>
@anmonteiro anmonteiro force-pushed the anmonteiro/retry-clear-dir branch from 3b965c9 to 2bb5d83 Compare January 19, 2026 02:10
| exception Unix.Unix_error (ENOENT, _, _) ->
(* How can we end up here? [clear_dir] cleared the directory successfully,
and rm_rf_dir =
let max_rm_dir_attempts = 5 in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still fail after 5 times though. How about on failure, we just give up and move the sandbox to some place like _build/.broken-sandboxes that we lazily clean-up on start-up before creating the first sandbox? I think this is more reliable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also be useful on Windows for which there is a specialised win32_unlink function above. The issue on Windows is that background services like the virus scanner can hog the file permissions and cause a sandbox deletion to fail. Clearing up the sandbox doesn't really need to be so fatal, so I think moving it to a broken sandbox directory would help us avoid the retry code there too.

@rgrinberg
Copy link
Member

It would be good to know which actions are causing this for you. Such an action still has processes lingering after dune considers it finished.

@Alizter

This comment was marked as off-topic.

@Alizter

This comment was marked as off-topic.

@rgrinberg
Copy link
Member

What's the alternative? Opening the fd after dispatching? Sure, you can do that but I don't see what good it would do.

Are you so confident that this is relevant to the issue at all though?

@Alizter

This comment was marked as off-topic.

@anmonteiro
Copy link
Collaborator Author

It's hard to tell which rule is being built when this happens, but in the limited logging I added, the directory being removed was: /tmp/nix-shell.q8ptGB/build_74613a_dune/build_82414a_dune, and it contained a file named camlasm67f204-90f29a.s when it threw the exception.

@rgrinberg
Copy link
Member

Thanks for investigating. Unfortunately that's a bit too vague as just about any compilation exception will have these assembly artifacts. I am bit a surprised that this directory is outside the sandbox and we're still trying to clean it up though.

Going to draft this PR for now since I'd like to see the fix that I mentioned earlier here instead.

@rgrinberg rgrinberg marked this pull request as draft January 20, 2026 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants