fix(#571): cap concurrent git subprocesses to prevent FD exhaustion#572
Merged
Conversation
… concurrent git ops) Adds tests/hosters/test_concurrency.py with a test that asserts concurrent cloned_repository() calls do not exhaust OS file descriptors. The test uses resource.setrlimit to simulate a restrictive container FD limit (64), then launches 30 concurrent clones. Without a concurrency cap, 13/30 fail with OSError(24). The test will pass once a semaphore bounds concurrent git subprocess spawning. Also includes repro_concurrency.py (standalone repro script) with the default concurrency raised to 80 to match the confirmed error output in issue #571. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes repro_concurrency.py from the repo root and absorbs its logic into tests/hosters/test_concurrency.py as a proper pytest test. Two failing tests now document the bug: - test_concurrent_subprocess_pipes_do_not_exhaust_fd_limit: isolates the raw FD lifecycle using sleep subprocesses as a stand-in for slow remote git clones (80 concurrent requests × 2 subprocesses × 2 pipe FDs = 320 > 256). - test_concurrent_cloned_repository_does_not_exhaust_file_descriptors: tests the actual cloned_repository() code path (30 concurrent calls, FD limit 64). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each asyncio subprocess with stdout=PIPE, stderr=PIPE holds 2 FDs for its lifetime. Without a cap, N concurrent cloned_repository() calls accumulate N×2 FDs simultaneously, exhausting the container file descriptor limit. Adds a module-level asyncio.Semaphore(16) in check_call() so at most 16 git processes run concurrently. The semaphore is held from subprocess creation until proc.wait() returns, exactly the window when FDs are live. Updates the sleep-based regression test to route through check_call() so it exercises the same code path as production git operations. Closes #571 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CI environment (Ubuntu 22.04 + GitLab via Docker) has ~45+ FDs open when the test runs, meaning base + semaphore(16)×2 = base+32 already exceeds the hardcoded limit of 64. Replace the hardcoded _FD_LIMIT with a dynamic baseline: count open FDs at test entry via /proc/self/fd (fallback 20 for non-Linux), then set the limit to base + 48. This keeps the invariant semaphore_peak (base+32) < limit (base+48) < unbounded_peak (base+60) regardless of how many FDs the pytest/asyncio/Docker stack holds open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three compounding issues caused tests to hang or fail: 1. Module-level asyncio.Semaphore bound to the first event loop raised RuntimeError in subsequent test loops (pytest-asyncio creates a new loop per test). Fixed by storing one semaphore per loop in a WeakKeyDictionary. 2. setrlimit(base + 48) in the git-based test set the FD limit below the process's current open-FD count on macOS (no /proc/self/fd, so the baseline fell back to 20). Removed the setrlimit call; the test now verifies all concurrent clones succeed without error, which is sufficient to confirm the semaphore fix applies to the real git code path. 3. Cancelled subprocesses were left running as orphans, causing ThreadedChildWatcher to block event-loop shutdown. Fixed by killing the process on any BaseException in check_call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
o1da
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
asyncio.Semaphore(16)incheck_call()(src/foxops/utils.py) to bound concurrent git subprocess spawning. The semaphore is held fromcreate_subprocess_execuntilproc.wait()returns — exactly the window when pipe FDs are live.tests/hosters/test_concurrency.pywith two regression tests:test_concurrent_subprocess_pipes_do_not_exhaust_fd_limit— 80 concurrent×2check_call("sleep")invocations under a 256-FD limittest_concurrent_cloned_repository_does_not_exhaust_file_descriptors— 30 concurrentcloned_repository()calls under a 64-FD limitHow the fix works
Each
gitsubprocess launched viacheck_call()opensstdout=PIPE, stderr=PIPE, consuming 2 file descriptors for its lifetime. With no cap, N concurrentcloned_repository()calls accumulate N×2 FDs simultaneously. A semaphore of 16 bounds peak pipe-FD usage to 32, well within restrictive container limits (~64–256).The
CalledProcessErrorcheck and stdout/stderr reads remain outside the semaphore: the process has already exited at that point so no new FDs are required.Closes #571
Test plan
tests on Python 3.12 / Gitlab 17.2.8-ce.0)🤖 Generated with Claude Code