Skip to content

Conversation

@JakeChampion
Copy link
Contributor

@JakeChampion JakeChampion commented Jul 31, 2025

potentially fixes #12197 and #8955

The check_fd() function had a potential bug where it would close and reopen
log files to verify existence, which could inadvertently close stderr/stdout
when logging to these special files, breaking error reporting for the entire
process.

Also, the closing of stderr makes fd(2) available for whoever calls open() next, which for us, was a rocksdb sst file and caused our rocksdb instance to become corrupted.

Fixed by replacing the access() + close/reopen pattern with fstat()
on the open file descriptor. This approach:

  • Uses st_nlink == 0 to detect when regular files have been unlinked
  • Never triggers for special files like stderr/stdout (they maintain st_nlink > 0)

@JakeChampion JakeChampion force-pushed the jake/logfile-check_fd branch from 76b1841 to 7081c6c Compare July 31, 2025 08:21
@JakeChampion JakeChampion changed the title Fix potential stderr closure bug in LogFile::check_fd() Fix stderr closure bug in LogFile::check_fd() using fstat() Jul 31, 2025
@JakeChampion JakeChampion force-pushed the jake/logfile-check_fd branch from 7081c6c to c1faa2d Compare July 31, 2025 08:22
The check_fd() function had a critical bug where it would close and reopen
log files to verify existence, which could inadvertently close stderr/stdout
when logging to these special files, breaking error reporting for the entire
process.

Fixed by replacing the unsafe access() + close/reopen pattern with fstat()
on the open file descriptor. This approach:
- Uses st_nlink == 0 to detect when regular files have been unlinked
- Never triggers for special files like stderr/stdout (they maintain st_nlink > 0)
- Eliminates string comparisons and special-case handling
- Provides universal safety across all file descriptor types
- Maintains detection of externally moved/deleted log files for rotation
@JakeChampion JakeChampion force-pushed the jake/logfile-check_fd branch from c1faa2d to e4157be Compare July 31, 2025 08:23
@JakeChampion JakeChampion marked this pull request as ready for review July 31, 2025 09:58
@bneradt bneradt self-requested a review August 4, 2025 22:11
@bneradt bneradt added this to the 10.2.0 milestone Aug 4, 2025
@bneradt bneradt requested a review from Copilot September 2, 2025 21:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug in the LogFile::check_fd() function where closing and reopening log files could inadvertently close stderr/stdout when logging to these special files, potentially corrupting the process's error reporting and other file descriptors.

  • Replaces the access() + close/reopen pattern with fstat() on the open file descriptor
  • Uses st_nlink == 0 to detect unlinked regular files while preserving special files like stderr/stdout
  • Adds fallback logic for cases where fstat() fails or no valid file descriptor exists

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +734 to 757
if (m_name && is_open()) {
int fd = get_fd();
if (fd >= 0) {
struct stat st;
if (fstat(fd, &st) == 0) {
// If the file has been unlinked, st_nlink will be 0
// This only happens for regular files, not special files like stderr/stdout
if (st.st_nlink == 0) {
close_file();
}
} else {
// fstat failed, the file descriptor may be invalid
// Fall back to path-based existence check
if (!LogFile::exists(m_name)) {
close_file();
}
}
} else {
// No valid fd, fall back to path-based check
if (!LogFile::exists(m_name)) {
close_file();
}
}
}
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested conditional structure creates excessive complexity with duplicated fallback logic. Consider extracting the fallback check into a helper function or restructuring the conditions to reduce nesting and eliminate the duplicate LogFile::exists(m_name) checks.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. You might be fixing an issue commented in the autests too. Can you please try running with this applied to your patch:

diff --git a/tests/gold_tests/logging/log-filenames.test.py b/tests/gold_tests/logging/log-filenames.test.py
index a4fdbc73b..ec6e8928a 100644
--- a/tests/gold_tests/logging/log-filenames.test.py
+++ b/tests/gold_tests/logging/log-filenames.test.py
@@ -257,11 +257,4 @@ class stderrTest(LogFilenamesTest):
 DefaultNamedTest()
 CustomNamedTest()
 stdoutTest()
-
-# The following stderr test can be run successfully by hand using the replay
-# files from the sandbox. All the expected output goes to stderr. However, for
-# some reason during the AuTest run, the stderr output stops emitting after the
-# logging.yaml file is parsed. This is left here for now because it is valuable
-# for use during development, but it is left commented out so that it doesn't
-# produce the false failure in CI and developer test runs.
-# stderrTest()
+stderrTest()

That patch works fine for me locally with or without your patch, but I wonder whether it's related to what you are fixing.

@JakeChampion
Copy link
Contributor Author

Looks good. You might be fixing an issue commented in the autests too. Can you please try running with this applied to your patch:

diff --git a/tests/gold_tests/logging/log-filenames.test.py b/tests/gold_tests/logging/log-filenames.test.py
index a4fdbc73b..ec6e8928a 100644
--- a/tests/gold_tests/logging/log-filenames.test.py
+++ b/tests/gold_tests/logging/log-filenames.test.py
@@ -257,11 +257,4 @@ class stderrTest(LogFilenamesTest):
 DefaultNamedTest()
 CustomNamedTest()
 stdoutTest()
-
-# The following stderr test can be run successfully by hand using the replay
-# files from the sandbox. All the expected output goes to stderr. However, for
-# some reason during the AuTest run, the stderr output stops emitting after the
-# logging.yaml file is parsed. This is left here for now because it is valuable
-# for use during development, but it is left commented out so that it doesn't
-# produce the false failure in CI and developer test runs.
-# stderrTest()
+stderrTest()

That patch works fine for me locally with or without your patch, but I wonder whether it's related to what you are fixing.

I pushed that patch up, the tests passed with it ☺️

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing the fix!

@bneradt bneradt merged commit b13b985 into apache:master Sep 3, 2025
15 checks passed
@JakeChampion JakeChampion deleted the jake/logfile-check_fd branch September 3, 2025 14:56
@bneradt bneradt linked an issue Dec 1, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logging to stdout stops working after a couple of minutes Logs to stdout or stderr stops after a few minutes

2 participants