Skip to content

Close non-stdio file handles when daemonizing #2314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ntrrgc
Copy link

@ntrrgc ntrrgc commented Jan 8, 2025

Otherwise, when the compiler wrapper spawns the sccache server, the server may end up with unintended file descriptors, which can lead to unexpected problems.

This is particularly problematic if any of those file descriptors that accidentally end up in the server process is a pipe, as the pipe will only be closed when all the processes with that file descriptor close it or exit.

This was causing sccache to hang ninja, as ninja uses the EOF of a pipe given to the subprocess to detect when that subprocess has exited: ninja-build/ninja#2444 (comment)

This patch adds a dependency on the
close_fds crate, which automatically chooses an appropriate mechanism to close a range of file descriptors. On Linux 5.9+ that mechanism will be libc::close_range().

Fixes #2313

@ntrrgc
Copy link
Author

ntrrgc commented Jan 9, 2025

The test test_rust_cargo_cmd_readonly_preemtive_block was failing. After investigation, I found the reason:

if env::var("SCCACHE_ERROR_LOG").is_ok() {
    let f = create_error_log()?;
    // Can't report failure here, we're already daemonized.
    daemonize()?;
    redirect_error_log(f)?;

That code was opening a log file, daemonizing, then swapping stderr with it. We could pass the file descriptor of f to the list of exceptions of close_open_fds(), but in this particular case there is no need, as swapping stderr with a file is already a built-in feature of Daemonize.

I'm updating the patch to make daemonize accept a log file as argument and do the file handle swap at that point, which also removes the need for the redirect_stderr() and redirect_error_log() functions in sccache.

@ntrrgc ntrrgc force-pushed the 2025-01-08-close-fds branch from 0fc0c5a to 2c22e92 Compare January 9, 2025 23:05
@ntrrgc
Copy link
Author

ntrrgc commented Jan 21, 2025

Several CI jobs are consistently failing, so probably there are still file handles unaccounted. Will have to reproduce the failures locally and find the missing handles.

@ntrrgc
Copy link
Author

ntrrgc commented Jan 31, 2025

I checked the failures now and it seems it is just formatting nits and failures to build on Windows. If there was something more serious, I can't see it now...

Oh well, I'm updating the patch with the formatting nits applied and with the close_fd crate marked for use only in cfg(unix).

@ntrrgc ntrrgc force-pushed the 2025-01-08-close-fds branch from 2c22e92 to c619955 Compare January 31, 2025 17:01
@ntrrgc
Copy link
Author

ntrrgc commented Feb 13, 2025

Any thoughts on the patch? Can someone approve the CI workflows so they can run?

@sylvestre
Copy link
Collaborator

sorry for the latency
it seems that windows is failing :)

@ntrrgc
Copy link
Author

ntrrgc commented Feb 19, 2025

error[E0061]: this function takes 0 arguments but 1 argument was supplied
   --> src\commands.rs:645:17
    |
645 |                 daemonize(Some(f))?;
    |                 ^^^^^^^^^ ------- unexpected argument of type `std::option::Option<fs_err::File>`
    |
note: function defined here
   --> src\util.rs:942:8
    |
942 | pub fn daemonize() -> Result<()> {
    |        ^^^^^^^^^
help: remove the extra argument

This is very odd because daemonize() is modified by this patch to add that extra argument, and you can see that in the diff with those same line numbers:

diff --git a/src/util.rs b/src/util.rs
index 7174681..04d348c 100644
--- a/src/util.rs
+++ b/src/util.rs
@@ -841,7 +841,7 @@ impl<'a> Hasher for HashToDigest<'a> {
 
 /// Pipe `cmd`'s stdio to `/dev/null`, unless a specific env var is set.
 #[cfg(not(windows))]
-pub fn daemonize() -> Result<()> {
+pub fn daemonize(log_file: Option<File>) -> Result<()> {

It's as if the patch had been only partially applied by the CI server. The same error is happening in many of them.

@ntrrgc
Copy link
Author

ntrrgc commented Feb 19, 2025

Oooooh, because they're the Windows builds! I keep forgetting the #[cfg(not(windows))]. There has to be a Windows specific variant that I need to update.

@ntrrgc ntrrgc force-pushed the 2025-01-08-close-fds branch from c619955 to 8406beb Compare February 19, 2025 10:21
@ntrrgc
Copy link
Author

ntrrgc commented Feb 19, 2025

toml_format was complaining about the packages not being in alphabetical order (it didn't explicitly say it, but I could confirm it locally).

error: function `redirect_stderr` is never used
   --> src\commands.rs:130:4
    |
130 | fn redirect_stderr(f: File) {
    |    ^^^^^^^^^^^^^^^

redirect_stderr() should actually be used. daemonize() in Windows used to be a full no-op, but now that it takes the log file as argument, it should also set it as stderr in Windows, so the API is the same for all platforms. Fixed it by moving redirect_stderr() to util.rs and using it in the Windows version of daemonize():

/// Daemonizing is a no-op on Windows, but we still must set the log file as
/// stderr.
#[cfg(windows)]
pub fn daemonize(log_file: Option<File>) -> Result<()> {
    if let Some(log_file) = log_file {
        redirect_stderr(log_file);
    }
    Ok(())
}

The failures of the test_rust_cargo_cmd_readonly_preemtive_block in Windows are just a consequence of that log redirection missing in Windows, so that should be fixed now too:

    let log_contents = fs::read_to_string(sccache_log)?;
    assert!(predicates::str::contains("server has setup with ReadOnly").eval(log_contents.as_str()));

Hopefully no more remaining failures!

@ntrrgc ntrrgc force-pushed the 2025-01-08-close-fds branch from 8406beb to 5a3a6be Compare February 19, 2025 14:35
@AJIOB
Copy link
Contributor

AJIOB commented Apr 12, 2025

Hi @sylvestre,

Is it ok for you now?

Otherwise, when the compiler wrapper spawns the sccache server, the
server may end up with unintended file descriptors, which can lead to
unexpected problems.

This is particularly problematic if any of those file descriptors that
accidentally end up in the server process is a pipe, as the pipe will
only be closed when all the processes with that file descriptor close it
or exit.

This was causing sccache to hang ninja, as ninja uses the EOF of a pipe
given to the subprocess to detect when that subprocess has exited:
ninja-build/ninja#2444 (comment)

This patch adds a dependency on the
[close_fds](https://crates.io/crates/close_fds) crate, which
automatically chooses an appropriate mechanism to close a range of file
descriptors. On Linux 5.9+ that mechanism will be libc::close_range().

Fixes mozilla#2313
@sylvestre sylvestre force-pushed the 2025-01-08-close-fds branch from 5a3a6be to 8125bae Compare April 12, 2025 15:55
@codecov-commenter
Copy link

codecov-commenter commented Apr 12, 2025

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 65.63%. Comparing base (733688f) to head (8125bae).

Files with missing lines Patch % Lines
src/bin/sccache-dist/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2314      +/-   ##
==========================================
- Coverage   65.66%   65.63%   -0.04%     
==========================================
  Files          65       65              
  Lines       35496    35531      +35     
==========================================
+ Hits        23310    23321      +11     
- Misses      12186    12210      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sccache could be more zealous closing file descriptors when daemonizing
4 participants