-
Notifications
You must be signed in to change notification settings - Fork 8
fix(upstream): use parent context for shutdown detection in Docker recovery #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…covery Fixed a race condition where Docker recovery retry logging occurred after test completion despite having a context check. The issue was using the child `recoveryCtx` instead of parent `ctx` for shutdown detection. Root Cause: - Child context (`recoveryCtx`) cancellation may lag behind parent (`ctx`) - Test cleanup cancels parent context immediately - Tiny window between context check and log execution - On CI machines, this race is more likely due to scheduling delays Changes: - Check parent `ctx.Done()` instead of `recoveryCtx.Done()` (line 1905) - Parent context reflects shutdown state more immediately - Eliminates race window between check and log execution This completes the fix from PR #124 by using the correct context for shutdown detection. Fixes: - Unit Tests (macos-latest, 1.23.10) on main branch - Race condition: "Log in goroutine after test has completed" Error: ``` panic: Log in goroutine after TestUpdateListenAddressValidation has completed: 2025-11-03T20:02:22.176Z INFO Docker recovery retry {"attempt": 1, "elapsed": "2.002340625s", "next_retry_in": "5s"} mcpproxy-go/internal/upstream.(*Manager).handleDockerUnavailable internal/upstream/manager.go:1910 ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📦 Build Artifacts for PR #126Version: Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 19047803960 --repo smart-mcp-proxy/mcpproxy-go
# Or download a specific artifact:
gh run download 19047803960 --name dmg-darwin-arm64
|
Dumbris
added a commit
that referenced
this pull request
Nov 4, 2025
Complete the fix from PR #126 by using parent context consistently throughout handleDockerUnavailable. This eliminates the race condition where child context cancellation lags parent by nanoseconds. Changes: - Line 1841: Changed recoveryCtx.Done() -> ctx.Done() in main select - Line 1846: Changed recoveryCtx.Done() -> ctx.Done() after timer fires This ensures immediate shutdown detection when tests complete, preventing "Log in goroutine after test has completed" panics. Fixes the persistent test failure in TestUpdateListenAddressValidation on macOS. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris
added a commit
that referenced
this pull request
Nov 4, 2025
…ecks Replace select-default pattern with ctx.Err() checks to eliminate TOCTOU race condition in Docker recovery monitoring goroutines. The select-default pattern creates a race window where context can be cancelled AFTER the check passes but BEFORE logging happens, causing "Log in goroutine after test has completed" panics. Using ctx.Err() provides atomic context cancellation checking without the race condition. Changes: - handleDockerUnavailable: Use ctx.Err() for all context checks before logging - startDockerRecoveryMonitor: Use ctx.Err() in initial check and periodic monitoring - Added ctx.Err() check before "Docker recovery successful" logging Fixes the persistent test failure in TestUpdateListenAddressValidation that survived PR #126 and PR #127. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris
added a commit
that referenced
this pull request
Nov 4, 2025
…utdown ## Problem After PR #129, tests were still failing with "panic: Log in goroutine after test completed" on macOS. The `ctx.Err()` checks prevented the race between checking context and logging, but goroutines could still be running when tests completed because there was no synchronization mechanism. ## Root Cause The Manager struct had `shutdownCtx` and `shutdownCancel` for signaling shutdown, but **no `sync.WaitGroup` to track background goroutines**. When `ShutdownAll()` or `DisconnectAll()` was called: 1. Context was cancelled 2. Function returned immediately 3. Background goroutines (OAuth monitor, Docker recovery) were still running 4. Test completed and marked `testing.T` as done 5. Goroutines tried to log → panic ## Solution Added proper goroutine lifecycle management: 1. **Manager.shutdownWg** field - Tracks all background goroutines 2. **Goroutine launches** - Call `shutdownWg.Add(1)` before spawning: - `startOAuthEventMonitor` (in NewManager) - `startDockerRecoveryMonitor` (in NewManager and dynamic restart) 3. **Goroutine exits** - Call `defer shutdownWg.Done()` in both monitor functions 4. **Shutdown waiting** - Call `shutdownWg.Wait()` in: - `ShutdownAll()` - with 3-second timeout - `DisconnectAll()` - blocking wait ## Changes - `Manager` struct: Added `shutdownWg sync.WaitGroup` - `NewManager()`: Added `Add(1)` before launching monitors - `startOAuthEventMonitor()`: Added `defer Done()` at start - `startDockerRecoveryMonitor()`: Added `defer Done()` at start - Dynamic recovery restart: Added `Add(1)` before `go startDockerRecoveryMonitor()` - `ShutdownAll()`: Cancel context, wait for goroutines with timeout - `DisconnectAll()`: Cancel context, wait for goroutines ## Testing ```bash go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5 # Result: 5/5 passed, no race conditions go test ./internal/upstream ./internal/runtime -v -race -count=2 # Result: All tests passed ``` ## Related - Fixes persistent issue from PR #122, #126, #127, #129 - Complements the `ctx.Err()` checks added in PR #129 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixed a race condition where Docker recovery retry logging occurred after test completion despite having a context check from PR #124. The issue was using the child
recoveryCtxinstead of parentctxfor shutdown detection.Root Cause
PR #124 added a context check before logging, but it checked the wrong context:
The problem:
ctx= Parent context (cancelled when test/manager shuts down)recoveryCtx= Child context created withcontext.WithCancel(ctx)On CI machines with scheduling delays, this tiny race window is hit more frequently.
Changes
File:
internal/upstream/manager.goUse parent context for shutdown detection (line 1905):
Why This Works
ctxis cancelled immediately when shutdown startsTest Results
✅ Tests pass locally with the fix:
go test -v -race -run TestUpdateListenAddressValidation ./internal/runtime/...Fixes
Main branch failure:
Error:
Related PRs
🤖 Generated with Claude Code