fix(upstream): prevent goroutine logging after test completion in Docker recovery #122

Dumbris · 2025-11-03T18:29:51Z

Summary

Fixed a race condition where Docker recovery monitor goroutines would log after tests completed, causing a panic: "Log in goroutine after test has completed".

Root Cause

The issue occurred when:

Tests create a Runtime with upstream manager
Docker recovery monitor starts a goroutine (handleDockerUnavailable)
Test completes and calls Close(), canceling the context
Goroutine wakes from timer (after 2-5 seconds) and logs without checking if context was cancelled

Changes

File: internal/upstream/manager.go

Added context check after timer fires (handleDockerUnavailable:1823-1828)
- After time.After fires, immediately check if context was cancelled
- Prevents logging and further operations if shutdown is in progress
Added context checks in reconnection goroutine (handleDockerUnavailable:1851-1865)
- Check context before calling ForceReconnectAll
- Check context again before logging results
- Prevents goroutine from logging after test cleanup

Test Results

✅ All tests pass locally with race detection:

go test -v -race -timeout 2m ./internal/runtime/...

✅ Specific failing test now passes:

go test -v -race -run TestUpdateListenAddressValidation ./internal/runtime/...

Fixes

Resolves test failures in CI:
- Unit Tests (macos-latest, 1.25)
- End-to-End Tests (macos-latest, 1.23.10)
Prevents goroutine leak in tests
Ensures clean shutdown without panics

Related Issues

Fixes errors from:

🤖 Generated with Claude Code

…ker recovery Fixed a race condition where Docker recovery monitor goroutines would log after tests completed, causing panic: "Log in goroutine after test has completed". The issue occurred when: 1. Tests create a Runtime with upstream manager 2. Docker recovery monitor starts a goroutine 3. Test completes and calls Close(), canceling context 4. Goroutine wakes from timer and logs without checking context Changes: - Added context check in handleDockerUnavailable after timer fires - Added context checks in reconnection goroutine before logging - Ensures all goroutines respect context cancellation Fixes test failures in: - TestUpdateListenAddressValidation (internal/runtime) - All runtime tests with race detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-11-03T18:33:24Z

📦 Build Artifacts for PR #122

Version: v0.9.24-dev-4957fb3-pr122
Latest tag: v0.9.24

Available Artifacts

frontend-dist-pr (177 KB)
archive-linux-amd64 (9.7 MB)
archive-linux-arm64 (8.8 MB)
archive-darwin-arm64 (16 MB)
installer-dmg-darwin-arm64 (18 MB)
archive-darwin-amd64 (18 MB)
installer-dmg-darwin-amd64 (20 MB)
archive-windows-amd64 (18 MB)
archive-windows-arm64 (16 MB)

How to Download

Option 1: GitHub Web UI (easiest)

Go to the workflow run page
Scroll to the bottom "Artifacts" section
Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 19045236081 --repo smart-mcp-proxy/mcpproxy-go

# Or download a specific artifact:
gh run download 19045236081 --name dmg-darwin-arm64

Note: Artifacts expire in 14 days.

…n Windows Fixed Windows test failures by adding context checks before logging in startDockerRecoveryMonitor. This prevents goroutines from logging after test completion. The issue occurred when: 1. Tests create a Runtime with upstream manager 2. Docker recovery monitor checks availability and finds Docker unavailable 3. Test completes and calls Close(), canceling context 4. Goroutine logs warning synchronously without checking context cancellation Changes: - Add context check before logging in initial Docker availability check - Add context check before logging in periodic monitoring ticker - Ensures goroutines respect context cancellation on all platforms Fixes Windows test failure: - Unit Tests (windows-latest, 1.23.10) https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19045548056/job/54392774796 Complements PR #122 which fixed the same issue on macOS/Linux. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ecovery retry Fixed another goroutine leak where Docker recovery retry logging occurred after test completion, causing panic: "Log in goroutine after test has completed". The issue occurred in handleDockerUnavailable when: 1. Timer fires and checkDockerAvailability is called 2. Docker check fails (returns error) 3. Test completes and context is cancelled 4. Goroutine logs "Docker recovery retry" without checking context This completes the fix for Docker recovery monitor logging issues by adding context checks before ALL logging statements. Changes: - Add context check before "Docker recovery retry" log (line 1896) - Ensures goroutine exits cleanly if context was cancelled during check Fixes test failures: - End-to-End Tests (macos-latest, 1.23.10) https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19046210469 Complements PR #122 and PR #123. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ellation Fixed OAuth event monitor goroutine leak where logging occurred after test completion, causing panic: "Log in goroutine after test has completed". The issue occurred in startOAuthEventMonitor when: 1. Ticker fires (every 5 seconds) 2. processOAuthEvents() is called 3. Test completes and context is cancelled 4. processOAuthEvents() logs debug message without checking context This is similar to the Docker recovery monitor issues fixed in PR #122, #123, #124. Changes: - Add context check after ticker fires, before calling processOAuthEvents() - Return immediately if context was cancelled during ticker wait - Prevents goroutine from logging after test cleanup Fixes test failures: - Unit Tests (windows-latest, 1.23.10) https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19046681617/job/54396553672 Error: ``` panic: Log in goroutine after TestUpdateListenAddressValidation has completed: 2025-11-03T19:28:45.714Z DEBUG processOAuthEvents: checking for OAuth completion events... goroutine 144 [running]: mcpproxy-go/internal/upstream.(*Manager).processOAuthEvents(0xc0006b64d0) D:/a/mcpproxy-go/mcpproxy-go/internal/upstream/manager.go:1493 +0x9e mcpproxy-go/internal/upstream.(*Manager).startOAuthEventMonitor(0xc0006b64d0, {0x1414d55f8, 0xc000600a50}) D:/a/mcpproxy-go/mcpproxy-go/internal/upstream/manager.go:1474 +0x1e8 ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…utdown ## Problem After PR #129, tests were still failing with "panic: Log in goroutine after test completed" on macOS. The `ctx.Err()` checks prevented the race between checking context and logging, but goroutines could still be running when tests completed because there was no synchronization mechanism. ## Root Cause The Manager struct had `shutdownCtx` and `shutdownCancel` for signaling shutdown, but **no `sync.WaitGroup` to track background goroutines**. When `ShutdownAll()` or `DisconnectAll()` was called: 1. Context was cancelled 2. Function returned immediately 3. Background goroutines (OAuth monitor, Docker recovery) were still running 4. Test completed and marked `testing.T` as done 5. Goroutines tried to log → panic ## Solution Added proper goroutine lifecycle management: 1. **Manager.shutdownWg** field - Tracks all background goroutines 2. **Goroutine launches** - Call `shutdownWg.Add(1)` before spawning: - `startOAuthEventMonitor` (in NewManager) - `startDockerRecoveryMonitor` (in NewManager and dynamic restart) 3. **Goroutine exits** - Call `defer shutdownWg.Done()` in both monitor functions 4. **Shutdown waiting** - Call `shutdownWg.Wait()` in: - `ShutdownAll()` - with 3-second timeout - `DisconnectAll()` - blocking wait ## Changes - `Manager` struct: Added `shutdownWg sync.WaitGroup` - `NewManager()`: Added `Add(1)` before launching monitors - `startOAuthEventMonitor()`: Added `defer Done()` at start - `startDockerRecoveryMonitor()`: Added `defer Done()` at start - Dynamic recovery restart: Added `Add(1)` before `go startDockerRecoveryMonitor()` - `ShutdownAll()`: Cancel context, wait for goroutines with timeout - `DisconnectAll()`: Cancel context, wait for goroutines ## Testing ```bash go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5 # Result: 5/5 passed, no race conditions go test ./internal/upstream ./internal/runtime -v -race -count=2 # Result: All tests passed ``` ## Related - Fixes persistent issue from PR #122, #126, #127, #129 - Complements the `ctx.Err()` checks added in PR #129 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Dumbris merged commit 5226c36 into main Nov 3, 2025
20 checks passed

Dumbris deleted the fix/docker-recovery-goroutine-leak branch November 3, 2025 18:40

Dumbris mentioned this pull request Nov 3, 2025

fix(upstream): prevent goroutine logging in Docker recovery monitor on Windows #123

Merged

Dumbris mentioned this pull request Nov 3, 2025

fix(upstream): prevent logging after context cancellation in Docker recovery retry #124

Merged

Dumbris mentioned this pull request Nov 3, 2025

fix(upstream): prevent OAuth event monitor logging after context cancellation #125

Merged

Dumbris mentioned this pull request Nov 4, 2025

fix(upstream): add WaitGroup to track background goroutines during shutdown #130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(upstream): prevent goroutine logging after test completion in Docker recovery #122

fix(upstream): prevent goroutine logging after test completion in Docker recovery #122

Uh oh!

Dumbris commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(upstream): prevent goroutine logging after test completion in Docker recovery #122

fix(upstream): prevent goroutine logging after test completion in Docker recovery #122

Uh oh!

Conversation

Dumbris commented Nov 3, 2025

Summary

Root Cause

Changes

Test Results

Fixes

Related Issues

Uh oh!

github-actions bot commented Nov 3, 2025

📦 Build Artifacts for PR #122

Available Artifacts

How to Download

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants