Skip to content

Conversation

@Dumbris
Copy link
Contributor

@Dumbris Dumbris commented Nov 4, 2025

Problem

After PR #129, tests were still failing with panic: Log in goroutine after test completed on macOS. The ctx.Err() checks added in PR #129 prevented the race between checking context state and logging, but goroutines could still be running when tests completed because there was no synchronization mechanism to wait for them.

Root Cause

The Manager struct had:

  • shutdownCtx and shutdownCancel for signaling shutdown
  • NO sync.WaitGroup to track background goroutines

When ShutdownAll() or DisconnectAll() was called:

  1. Context was cancelled (goroutines told to exit)
  2. Function returned immediately
  3. Background goroutines (OAuth monitor, Docker recovery) were still running
  4. Test completed and marked testing.T as done
  5. Goroutines tried to log → panic!

Even with ctx.Err() checks, there's a race window between the check and the log statement. The proper fix is to wait for goroutines to actually exit.

Solution

Added proper goroutine lifecycle management with sync.WaitGroup:

1. Manager.shutdownWg Field

type Manager struct {
    // ...
    shutdownWg sync.WaitGroup // Tracks all background goroutines for clean shutdown
}

2. Track Goroutine Launches

In NewManager():

if boltStorage != nil {
    manager.shutdownWg.Add(1)
    go manager.startOAuthEventMonitor(shutdownCtx)
}

if storageMgr != nil {
    manager.shutdownWg.Add(1)
    go manager.startDockerRecoveryMonitor(shutdownCtx)
}

Dynamic restart:

m.shutdownWg.Add(1)
go m.startDockerRecoveryMonitor(ctx)

3. Track Goroutine Exits

func (m *Manager) startOAuthEventMonitor(ctx context.Context) {
    defer m.shutdownWg.Done()
    // ... monitor logic ...
}

func (m *Manager) startDockerRecoveryMonitor(ctx context.Context) {
    defer m.shutdownWg.Done()
    // ... monitor logic ...
}

4. Wait During Shutdown

ShutdownAll() - with timeout:

// Cancel shutdown context
if m.shutdownCancel != nil {
    m.shutdownCancel()
}

// Wait for background goroutines to exit (with timeout)
waitDone := make(chan struct{})
go func() {
    m.shutdownWg.Wait()
    close(waitDone)
}()

select {
case <-waitDone:
    m.logger.Info("All background goroutines exited cleanly")
case <-time.After(3 * time.Second):
    m.logger.Warn("Background goroutines did not exit within 3 seconds, proceeding with shutdown")
case <-ctx.Done():
    m.logger.Warn("Shutdown context cancelled while waiting for goroutines")
}

DisconnectAll() - blocking wait:

if m.shutdownCancel != nil {
    m.shutdownCancel()
}

m.shutdownWg.Wait() // Block until all goroutines exit

Testing

Local Testing (5 consecutive runs with race detector)

$ go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5
=== RUN   TestUpdateListenAddressValidation
    logger.go:146: 2025-11-04T10:10:51.450+0200	INFO	Shutting down all upstream servers
    logger.go:146: 2025-11-04T10:10:51.450+0200	INFO	OAuth event monitor stopped due to context cancellation
    logger.go:146: 2025-11-04T10:10:51.451+0200	INFO	All background goroutines exited cleanly
--- PASS: TestUpdateListenAddressValidation (0.30s)
...
PASS (5/5 runs)

Comprehensive Test Suite

$ go test ./internal/upstream ./internal/runtime -v -race -count=2
PASS (all tests passed)

Changes Summary

  • Manager struct: Added shutdownWg sync.WaitGroup field
  • NewManager(): Added shutdownWg.Add(1) before goroutine launches (2 locations)
  • Dynamic restart: Added shutdownWg.Add(1) before restarting recovery monitor
  • startOAuthEventMonitor(): Added defer shutdownWg.Done() at function start
  • startDockerRecoveryMonitor(): Added defer shutdownWg.Done() at function start
  • ShutdownAll(): Cancel context + wait for goroutines with 3-second timeout
  • DisconnectAll(): Cancel context + blocking wait for goroutines

Related Issues

Why This Fix Works

The previous fixes (PR #122-#129) focused on preventing logging after context cancellation. This fix ensures goroutines actually finish before tests complete. The WaitGroup provides synchronization that:

  1. Tracks when all background goroutines have started (Add(1))
  2. Tracks when each goroutine exits (defer Done())
  3. Blocks shutdown until all goroutines finish (Wait())
  4. Uses timeout to prevent hanging on stuck goroutines

This is the correct and idiomatic Go pattern for managing background goroutine lifecycles.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

…utdown

## Problem
After PR #129, tests were still failing with "panic: Log in goroutine after
test completed" on macOS. The `ctx.Err()` checks prevented the race between
checking context and logging, but goroutines could still be running when tests
completed because there was no synchronization mechanism.

## Root Cause
The Manager struct had `shutdownCtx` and `shutdownCancel` for signaling
shutdown, but **no `sync.WaitGroup` to track background goroutines**. When
`ShutdownAll()` or `DisconnectAll()` was called:
1. Context was cancelled
2. Function returned immediately
3. Background goroutines (OAuth monitor, Docker recovery) were still running
4. Test completed and marked `testing.T` as done
5. Goroutines tried to log → panic

## Solution
Added proper goroutine lifecycle management:

1. **Manager.shutdownWg** field - Tracks all background goroutines
2. **Goroutine launches** - Call `shutdownWg.Add(1)` before spawning:
   - `startOAuthEventMonitor` (in NewManager)
   - `startDockerRecoveryMonitor` (in NewManager and dynamic restart)
3. **Goroutine exits** - Call `defer shutdownWg.Done()` in both monitor functions
4. **Shutdown waiting** - Call `shutdownWg.Wait()` in:
   - `ShutdownAll()` - with 3-second timeout
   - `DisconnectAll()` - blocking wait

## Changes
- `Manager` struct: Added `shutdownWg sync.WaitGroup`
- `NewManager()`: Added `Add(1)` before launching monitors
- `startOAuthEventMonitor()`: Added `defer Done()` at start
- `startDockerRecoveryMonitor()`: Added `defer Done()` at start
- Dynamic recovery restart: Added `Add(1)` before `go startDockerRecoveryMonitor()`
- `ShutdownAll()`: Cancel context, wait for goroutines with timeout
- `DisconnectAll()`: Cancel context, wait for goroutines

## Testing
```bash
go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5
# Result: 5/5 passed, no race conditions

go test ./internal/upstream ./internal/runtime -v -race -count=2
# Result: All tests passed
```

## Related
- Fixes persistent issue from PR #122, #126, #127, #129
- Complements the `ctx.Err()` checks added in PR #129

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

📦 Build Artifacts for PR #130

Version: v0.9.24-dev-31670c4-pr130
Latest tag: v0.9.24

Available Artifacts

  • frontend-dist-pr (177 KB)
  • archive-linux-amd64 (9.7 MB)
  • archive-linux-arm64 (8.8 MB)
  • archive-windows-arm64 (16 MB)
  • archive-darwin-arm64 (16 MB)
  • installer-dmg-darwin-arm64 (18 MB)
  • archive-darwin-amd64 (18 MB)
  • installer-dmg-darwin-amd64 (20 MB)
  • archive-windows-amd64 (18 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 19062084036 --repo smart-mcp-proxy/mcpproxy-go

# Or download a specific artifact:
gh run download 19062084036 --name dmg-darwin-arm64

Note: Artifacts expire in 14 days.

@Dumbris Dumbris merged commit 5eee4c8 into main Nov 4, 2025
20 checks passed
@Dumbris Dumbris deleted the fix/goroutine-leak-waitgroup branch November 4, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants