Skip to content

Conversation

@IssaneRen
Copy link

Summary

This PR re-introduces the fix for a rare concurrency issue in AdvancedFileOutput that can cause
Bad state: StreamSink is closed exceptions when a timer-triggered flush overlaps with a file rotation or sink closing.

Background

Sorry for closing the previous PR.
I initially submitted this fix against version 2.x, where we first observed the issue in production.
However, since all 2.x branches have now been removed, I re-tested on the latest 3.0.0 branch and confirmed
that the issue still occasionally occurs.
Therefore, I’m re-submitting this PR targeting v3.

Root Cause

When _flushBuffer() (triggered by the flush timer) and _updateTargetFile() (triggered by _targetFileUpdater
for rotation) execute concurrently, both may call _closeSink() or _openSink().
If the timer fires while the sink is being closed or reopened, the old _IOSink becomes invalid,
resulting in the runtime error:

Bad state: StreamSink is closed
#0 _StreamSinkImpl.add (dart:io/io_sink.dart:152)
#1 _IOSinkImpl.write (dart:io/io_sink.dart:287)
#2 _IOSinkImpl.writeAll (dart:io/io_sink.dart:298)
#3 AdvancedFileOutput._flushBuffer (package:logger/src/outputs/advanced_file_output.dart:183)
#4 AdvancedFileOutput.init.<anonymous closure> (package:logger/src/outputs/advanced_file_output.dart:161)

How to Reproduce

The issue is rare in normal configurations but can be consistently reproduced by:
• Setting maxDelay and fileUpdateDuration to very short intervals (tens or hundreds of milliseconds);
• Using a very small maxBufferSize and maxFileSizeKB to force frequent flushes and rotations;
• Continuously writing large log messages in a tight loop.

In this setup, when _targetFileUpdater triggers _updateTargetFile() and calls _closeSink()
while another _flushBuffer() is still running, the flush timer tries to write to a closed sink,
throwing the above exception.

Fix
• Add synchronization guards to ensure that _flushBuffer, _closeSink, and _updateTargetFile
do not run concurrently.
• Skip or delay flush attempts if _sink is already closing or reopening.

This ensures file rotation and flush operations are serialized and prevents writing to a closed sink.

Notes
• Originally observed in production (monitored via Sentry).
• Still reproducible in v3.0.0 under stress test conditions.
• This fix is fully backward-compatible and safe for existing users.

@Bungeefan
Copy link
Member

Sorry, but now I am confused.
You opened the first PR targeting the v3 branch, then closed it saying it was a mistake.
Now you opened this one, again targeting the v3 branch, and are talking about missing 2.x branches (which never existed?).

Additionally, I still can't reproduce your issue and without any way to test for it and I can't really accept a (working?) fix as it will be nearly guaranteed to regress if not taken into account in future changes.
And you still didn't tell me which commit you are actually using from the v3 branch.

Besides all that, your explanation of the "root cause" also doesn't make sense to me.

  • Yes there are two timers that can run concurrently, however, the flushBuffer timer never opens or closes a sink?
  • And even if it were, as soon as closeSink is called, the current sink instance is copied from the class scope and then instantly removed from it, so there should be no chance of it being used outside the method after this.

And to be honest, this description looks a lot like it was generated by AI...

@IssaneRen
Copy link
Author

IssaneRen commented Nov 3, 2025

Hi, thanks for the detailed follow-up — and sorry for the confusion earlier.
1. About my previous messages
You’re right to call this out. I did use an AI tool to polish my English in the earlier comment. That was only to help with translation (I drafted in Chinese first), but I didn’t trim the extra wording well enough. This time I rewrote it myself to be concise and clear.
2. About branches
As the screenshot shows, all released 2.x versions are tagged and there’s no active 2.x branch I can target. That’s why I first tried a 2.x fix, then re-tested on the 3.0.0 branch and opened a PR there.
image

3.	What I see in v2.6.0 (example code and sequence)

I’ll describe the race with the code structure from 2.6.0 because it’s easier to explain:
• We have a periodic task _targetFileUpdater that calls _updateTargetFile().
• When the file grows beyond the limit, _updateTargetFile() does a rotation:
await _closeSink(); await _file.rename(...); await _openSink();
• If an exception happens, the catch also calls _closeSink() then _openSink().
• Meanwhile, if I/O is slow (which can happen in the wild), the next _flushBuffer() (timer) may already start running.
Because Dart is single-threaded and every await yields to the event loop, the interleaving can be:

_flushBuffer():  if (_sink == null) return;  // passes the null check
// ---- yields to event loop ----

_closeSink():    await _sink?.flush();        // takes time
                 await _sink?.close();        // closes the underlying IOSink
                 // _sink is set to null at the end of closeSink
// ---- yields back to _flushBuffer() before _sink is set to null ----

_flushBuffer():  _sink?.writeAll(...);        // _sink is a non-null reference here,
                                              // but the underlying IOSink is already closed
                                              // => "Bad state: StreamSink is closed"

In short: the _sink field is nulled only at the end of _closeSink(). During the awaits inside _closeSink(), other tasks (like the flush timer) can resume and use a non-null _sink that points to a closed IOSink, which throws.
The relevant code in 2.6.0 looks like this (simplified):

Future<void> _updateTargetFile() async {
  try {
    if (await _file.exists() &&
        await _file.length() > _maxFileSizeKB * 1024) {
      await _closeSink();
      await _file.rename('$_path/${_fileNameFormatter(DateTime.now())}');
      await _deleteRotatedFiles();
      await _openSink();
    }
  } catch (e, s) {
    print(e);
    print(s);
    await _closeSink();
    await _openSink();
  }
}

void _flushBuffer() {
  if (_sink == null) return;
  for (final event in _buffer) {
    _sink?.writeAll(event.lines, Platform.isWindows ? '\r\n' : '\n');
    _sink?.writeln();
  }
  _buffer.clear();
}
4.	How I reproduce it locally

I couldn’t reproduce it with default timings either. But I can make it happen reliably by shortening the timers and forcing frequent rotation:
• Set maxDelay and fileUpdateDuration to very small values (milliseconds / low hundreds of ms).
• Set small maxBufferSize and very small maxFileSizeKB.
• Write large log lines in a tight loop.
Under these settings, the rotation path (_updateTargetFile() → _closeSink()) overlaps with the flush timer, and you eventually hit the write on a closed IOSink. In Sentry we only saw it in the wild (rarely), which makes sense because it depends on timing/I/O variance.
5. About the fix direction
To avoid this race, the options is:
• Mark a “closing” state before await _sink.close() and make _flushBuffer() bail out when closing

Thanks again, and sorry for the noise earlier without enough checking for AI translation Results.

@Bungeefan
Copy link
Member

Thanks for your explanation!

  1. Thanks, and please try to avoid using ChatGPT for translation purposes, especially for technical details (e.g. use Google Translate or DeepL instead).

  2. As I said, there were never any dedicated 2.x branches, the 2.x branch is also the default branch (main), as it is still the latest stable version.

  3. Your problem seems a lot like the one reported in AdvancedFileOutput causes StateError #99, and that was also the reason why I specifically asked for the exact commit from the v3 branch you are using (which you still didn't tell me).
    You should be able to check for the commit ref in your pubspec.lock.

    Anyway, my guess would be that you are using an older "version" of the v3 branch, which was still missing this specific bug fix (a27ba42), which fixes the problematic behavior you described above:

    // _sink is set to null at the end of closeSink

    Please try to reproduce it using either at least v2.6.1 or the latest commit from the v3 branch.


Please note: The 3.0.0 branch is an experimental branch, neither is its API stable nor is it considered ready for production (and therefore not really supported yet) - use it only at your own risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants