Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows file locking errors #4086

Closed
1 of 8 tasks
ehuss opened this issue Sep 15, 2021 · 6 comments
Closed
1 of 8 tasks

Windows file locking errors #4086

ehuss opened this issue Sep 15, 2021 · 6 comments
Assignees
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows

Comments

@ehuss
Copy link

ehuss commented Sep 15, 2021

Description

Starting around 2021-09-13 11:30 am PST, the rust-lang project has noticed a large failure rate on our Windows runners. We are experiencing sporadic errors reading, copying, and creating executables with various errors related to file locks. Some examples are:

  • OS error 32 (ERROR_SHARING_VIOLATION) copying an executable.
  • "LNK1104: cannot open file" running msvc linker.
  • "rm: cannot remove '...some.exe ': Device or resource busy"

We are also getting reports from other projects experiencing similar errors.

I wanted to check if there were perhaps any unannounced changes to windows-latest-xl, or if perhaps there are any new scanning features running (windows defender, indexing service, etc.).

More information may be found at rust-lang/rust#88924 with links to failed runs, or most of the runs at https://github.com/rust-lang-ci/rust/actions are currently failing due to this error.

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Links to failed builds:

https://github.com/rust-lang-ci/rust/runs/3591374923
https://github.com/rust-lang-ci/rust/runs/3591788935
https://github.com/rust-lang-ci/rust/runs/3593656760
https://github.com/rust-lang-ci/rust/runs/3594233900
https://github.com/rust-lang-ci/rust/runs/3594557656
https://github.com/rust-lang-ci/rust/runs/3592301669
https://github.com/rust-lang-ci/rust/runs/3600286433
https://github.com/Lokathor/wide/runs/3592322856
https://github.com/PyO3/pyo3/runs/3601829130
https://github.com/PyO3/pyo3/runs/3590459443

Is it regression?

No response

Expected behavior

Windows runners shouldn't have any services or issues locking files during a build.

Actual behavior

Windows runners are experiencing a high error rate with file locks related to executables being created or copied or removed.

Repro steps

Reproduction may be difficult since it is part of a large build system, and the errors are happening somewhat randomly, and it is not known what is causing them. But, roughly, most of the errors seem related to creating a new executable, and then immediately trying to copy it to a new location. Or, we are also seeing errors with link.exe failing to read files.

@Darleev Darleev added OS: Windows investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage OS: Windows labels Sep 15, 2021
@dibir-magomedsaygitov dibir-magomedsaygitov self-assigned this Sep 15, 2021
@CryZe
Copy link

CryZe commented Sep 15, 2021

The easiest way to replicate this every single time is to fork wide at this commit and remove the following marked changes: Lokathor/wide@844d8c7#diff-73e17259d77e5fbef83b2bdbbe4dc40a912f807472287f7f45b77e0cbf78792dR78-R85

Edit: Seems like Github doesn't quite link this properly. It's the following env var that needs to be removed in both places:

      env:
        CARGO_TARGET_DIR: "target-native"

Or honestly the commit before this should cause the bug too. Anyways, just ensure the env var is not set (the env var changes the location of the target executable, so it doesn't try to overwrite the executable previously built, which is exactly when the bug would occur). Instead of running another build there you can also just try to delete the target folder or the specific executable via a normal powershell command. That should also yield the problem.

@al-cheb
Copy link
Contributor

al-cheb commented Sep 15, 2021

@ehuss, @CryZe, Hey,
We have disabled an internal monitor. Could you please rerun builds and check results?

@CryZe
Copy link

CryZe commented Sep 15, 2021

That seems to have resolved it (at least in wide where it wasn't really sporadic): https://github.com/Lokathor/wide/pull/109/checks?check_run_id=3609115201

@miketimofeev
Copy link
Contributor

miketimofeev commented Sep 15, 2021

@CryZe yes, the engineering team identified the root cause and rolled back the changes a couple of hours ago. Sorry for the inconvenience.

@miketimofeev
Copy link
Contributor

I'm going to close the issue.
Please feel free to contact us if you have any concerns.

@ehuss
Copy link
Author

ehuss commented Sep 20, 2021

Thanks @miketimofeev for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows
Projects
None yet
Development

No branches or pull requests

6 participants