Skip to content

Conversation

@Dumbris
Copy link
Contributor

@Dumbris Dumbris commented Nov 3, 2025

Summary

Fixed a race condition where Docker recovery monitor goroutines would log after tests completed, causing a panic: "Log in goroutine after test has completed".

Root Cause

The issue occurred when:

  1. Tests create a Runtime with upstream manager
  2. Docker recovery monitor starts a goroutine (handleDockerUnavailable)
  3. Test completes and calls Close(), canceling the context
  4. Goroutine wakes from timer (after 2-5 seconds) and logs without checking if context was cancelled

Changes

File: internal/upstream/manager.go

  1. Added context check after timer fires (handleDockerUnavailable:1823-1828)

    • After time.After fires, immediately check if context was cancelled
    • Prevents logging and further operations if shutdown is in progress
  2. Added context checks in reconnection goroutine (handleDockerUnavailable:1851-1865)

    • Check context before calling ForceReconnectAll
    • Check context again before logging results
    • Prevents goroutine from logging after test cleanup

Test Results

✅ All tests pass locally with race detection:

go test -v -race -timeout 2m ./internal/runtime/...

✅ Specific failing test now passes:

go test -v -race -run TestUpdateListenAddressValidation ./internal/runtime/...

Fixes

  • Resolves test failures in CI:
    • Unit Tests (macos-latest, 1.25)
    • End-to-End Tests (macos-latest, 1.23.10)
  • Prevents goroutine leak in tests
  • Ensures clean shutdown without panics

Related Issues

Fixes errors from:

🤖 Generated with Claude Code

…ker recovery

Fixed a race condition where Docker recovery monitor goroutines would
log after tests completed, causing panic: "Log in goroutine after test
has completed".

The issue occurred when:
1. Tests create a Runtime with upstream manager
2. Docker recovery monitor starts a goroutine
3. Test completes and calls Close(), canceling context
4. Goroutine wakes from timer and logs without checking context

Changes:
- Added context check in handleDockerUnavailable after timer fires
- Added context checks in reconnection goroutine before logging
- Ensures all goroutines respect context cancellation

Fixes test failures in:
- TestUpdateListenAddressValidation (internal/runtime)
- All runtime tests with race detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

📦 Build Artifacts for PR #122

Version: v0.9.24-dev-4957fb3-pr122
Latest tag: v0.9.24

Available Artifacts

  • frontend-dist-pr (177 KB)
  • archive-linux-amd64 (9.7 MB)
  • archive-linux-arm64 (8.8 MB)
  • archive-darwin-arm64 (16 MB)
  • installer-dmg-darwin-arm64 (18 MB)
  • archive-darwin-amd64 (18 MB)
  • installer-dmg-darwin-amd64 (20 MB)
  • archive-windows-amd64 (18 MB)
  • archive-windows-arm64 (16 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 19045236081 --repo smart-mcp-proxy/mcpproxy-go

# Or download a specific artifact:
gh run download 19045236081 --name dmg-darwin-arm64

Note: Artifacts expire in 14 days.

@Dumbris Dumbris merged commit 5226c36 into main Nov 3, 2025
20 checks passed
@Dumbris Dumbris deleted the fix/docker-recovery-goroutine-leak branch November 3, 2025 18:40
Dumbris added a commit that referenced this pull request Nov 3, 2025
…n Windows

Fixed Windows test failures by adding context checks before logging in
startDockerRecoveryMonitor. This prevents goroutines from logging after
test completion.

The issue occurred when:
1. Tests create a Runtime with upstream manager
2. Docker recovery monitor checks availability and finds Docker unavailable
3. Test completes and calls Close(), canceling context
4. Goroutine logs warning synchronously without checking context cancellation

Changes:
- Add context check before logging in initial Docker availability check
- Add context check before logging in periodic monitoring ticker
- Ensures goroutines respect context cancellation on all platforms

Fixes Windows test failure:
- Unit Tests (windows-latest, 1.23.10)
  https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19045548056/job/54392774796

Complements PR #122 which fixed the same issue on macOS/Linux.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Nov 3, 2025
…ecovery retry

Fixed another goroutine leak where Docker recovery retry logging occurred
after test completion, causing panic: "Log in goroutine after test has
completed".

The issue occurred in handleDockerUnavailable when:
1. Timer fires and checkDockerAvailability is called
2. Docker check fails (returns error)
3. Test completes and context is cancelled
4. Goroutine logs "Docker recovery retry" without checking context

This completes the fix for Docker recovery monitor logging issues by
adding context checks before ALL logging statements.

Changes:
- Add context check before "Docker recovery retry" log (line 1896)
- Ensures goroutine exits cleanly if context was cancelled during check

Fixes test failures:
- End-to-End Tests (macos-latest, 1.23.10)
  https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19046210469

Complements PR #122 and PR #123.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Nov 3, 2025
…ellation

Fixed OAuth event monitor goroutine leak where logging occurred after test
completion, causing panic: "Log in goroutine after test has completed".

The issue occurred in startOAuthEventMonitor when:
1. Ticker fires (every 5 seconds)
2. processOAuthEvents() is called
3. Test completes and context is cancelled
4. processOAuthEvents() logs debug message without checking context

This is similar to the Docker recovery monitor issues fixed in PR #122, #123, #124.

Changes:
- Add context check after ticker fires, before calling processOAuthEvents()
- Return immediately if context was cancelled during ticker wait
- Prevents goroutine from logging after test cleanup

Fixes test failures:
- Unit Tests (windows-latest, 1.23.10)
  https://github.com/smart-mcp-proxy/mcpproxy-go/actions/runs/19046681617/job/54396553672

Error:
```
panic: Log in goroutine after TestUpdateListenAddressValidation has completed:
2025-11-03T19:28:45.714Z	DEBUG	processOAuthEvents: checking for OAuth completion events...

goroutine 144 [running]:
mcpproxy-go/internal/upstream.(*Manager).processOAuthEvents(0xc0006b64d0)
	D:/a/mcpproxy-go/mcpproxy-go/internal/upstream/manager.go:1493 +0x9e
mcpproxy-go/internal/upstream.(*Manager).startOAuthEventMonitor(0xc0006b64d0, {0x1414d55f8, 0xc000600a50})
	D:/a/mcpproxy-go/mcpproxy-go/internal/upstream/manager.go:1474 +0x1e8
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Nov 4, 2025
…utdown

## Problem
After PR #129, tests were still failing with "panic: Log in goroutine after
test completed" on macOS. The `ctx.Err()` checks prevented the race between
checking context and logging, but goroutines could still be running when tests
completed because there was no synchronization mechanism.

## Root Cause
The Manager struct had `shutdownCtx` and `shutdownCancel` for signaling
shutdown, but **no `sync.WaitGroup` to track background goroutines**. When
`ShutdownAll()` or `DisconnectAll()` was called:
1. Context was cancelled
2. Function returned immediately
3. Background goroutines (OAuth monitor, Docker recovery) were still running
4. Test completed and marked `testing.T` as done
5. Goroutines tried to log → panic

## Solution
Added proper goroutine lifecycle management:

1. **Manager.shutdownWg** field - Tracks all background goroutines
2. **Goroutine launches** - Call `shutdownWg.Add(1)` before spawning:
   - `startOAuthEventMonitor` (in NewManager)
   - `startDockerRecoveryMonitor` (in NewManager and dynamic restart)
3. **Goroutine exits** - Call `defer shutdownWg.Done()` in both monitor functions
4. **Shutdown waiting** - Call `shutdownWg.Wait()` in:
   - `ShutdownAll()` - with 3-second timeout
   - `DisconnectAll()` - blocking wait

## Changes
- `Manager` struct: Added `shutdownWg sync.WaitGroup`
- `NewManager()`: Added `Add(1)` before launching monitors
- `startOAuthEventMonitor()`: Added `defer Done()` at start
- `startDockerRecoveryMonitor()`: Added `defer Done()` at start
- Dynamic recovery restart: Added `Add(1)` before `go startDockerRecoveryMonitor()`
- `ShutdownAll()`: Cancel context, wait for goroutines with timeout
- `DisconnectAll()`: Cancel context, wait for goroutines

## Testing
```bash
go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5
# Result: 5/5 passed, no race conditions

go test ./internal/upstream ./internal/runtime -v -race -count=2
# Result: All tests passed
```

## Related
- Fixes persistent issue from PR #122, #126, #127, #129
- Complements the `ctx.Err()` checks added in PR #129

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants