Fix hanging cat filter issue by implementing communicate-style I/O #2126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #2080 where gix-filter would hang indefinitely when using no-op filters like
clean=cat
orsmudge=cat
with large files.Problem
The issue occurred in
gix-filter/src/driver/apply.rs
at line 114 wherestd::io::copy
was used to write all input data to a child process's stdin before reading any output:This approach causes a deadlock when:
cat
) reads from stdin and writes to stdoutcat
to block waiting for its stdout to be readstd::io::copy
blocks waiting to write more data to stdinSolution
Replaced the problematic
std::io::copy
with acommunicate()
function that uses threads to concurrently:This prevents deadlocks by ensuring that output is being consumed while input is being written.
Key changes:
communicate()
function that handles concurrent I/O using threadsReadFilterOutput
to acceptBox<dyn Read>
for flexibility with different reader typesTesting
Added comprehensive tests to verify the fix:
clean=cat
/smudge=cat
on specific files #2080 with multiple data sizes (1KB, 10KB, 100KB)The fix has been verified to work with data sizes that previously caused indefinite hangs while maintaining compatibility with all existing filter implementations.
Performance Impact
Minimal - the solution only affects single-file filter processes (not the more common multi-file process filters) and uses efficient buffering strategies. The threading overhead is negligible compared to process spawn costs.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.