-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious file-related failures on Windows runners #10483
Comments
Hi @tgross35 ,Thank you for bringing this issue to us. We are looking into this issue and will update you on this issue after investigating. |
Thank you for the response. If you need to watch active jobs there is always one running at https://github.com/rust-lang-ci/rust/actions (mostly the There are also ongoing experiments to run the jobs multiple time with different tweaks and see what fails, e.g. rust-lang/rust#129504 and rust-lang/rust#129522. |
@vidyasagarnimmagaddi is there something we could do to debug this better? Our failure rate is currently over 50% due to this issue. Somebody was able to confirm that we encounter this issue even running CI on an older state of our repo (from before this problem was noticed), which does seem to indicate it is caused by a change to the runner environment rather than changes to our code. |
@tgross35 - sure, we will update you shortly to provide workaround/solution to the issue. |
@ijunaidm Thanks! I'm one of the people working on this on the Rust side. Another data point: I've never been able to reproduce this on the |
@ijunaidm are there any updates here, or are you able to help us debug in some way (e.g. provide a way to ssh into active runners)? We were forced to switch to the small runners which seems to make this issue less prevalent (still very common) but need to move back to the large runners at some point. |
@tgross35 - Sorry, i will update you shortly on this issue. |
Description
For the past few months, the
rust-lang/rust
project has had a lot of spurious failures on the Windows runners. These are typically either failure to open a file (mostly fromlink.exe
) or failure to remove a file:LINK : fatal error LNK1104: cannot open file ...
error: failed to remove file ..., Access is denied (os error 5)
Example run: https://github.com/rust-lang-ci/rust/actions/runs/10537107932/job/29198090275
Is it possible that something changed that would cause this? Even if not and this is a problem with our tooling, we could use assistance debugging.
Further context, links to failed jobs, and attempts to reproduce are at rust-lang/rust#127883. Almost every PR showing up in the mentions list is from one of these failures. These errors are similar to what was reported in #4086.
Cc @ChrisDenton and @ehuss who have been working to reproduce this.
Platforms affected
Runner images affected
Image version and build link
Is it regression?
Yes, around 2024-06-27 but the exact start is unknown. It has seemingly gotten significantly worst in the past week or so, that job has at least a 25% failure rate from this issue in the past couple of days (probably close to 50%).
Expected behavior
Accessing or removing the files should succeed.
Actual behavior
The file operations are encountering spurious failures, as linked above.
Repro steps
No known consistent reproduction.
The text was updated successfully, but these errors were encountered: