Skip to content

Conversation

@Dumbris
Copy link
Contributor

@Dumbris Dumbris commented Nov 4, 2025

Summary

Complete the fix from PR #126 by using parent context consistently throughout handleDockerUnavailable. This eliminates the race condition where child context cancellation lags parent by nanoseconds.

Root Cause

PR #126 fixed line 1905 to use parent ctx instead of child recoveryCtx, but lines 1841 and 1846 still used recoveryCtx. This created a race condition:

  1. Test completes, parent ctx gets cancelled
  2. Timer fires at line 1843 (after 2 seconds)
  3. Code checks recoveryCtx.Done() at line 1846
  4. Child context cancellation may lag parent by nanoseconds
  5. Code falls through to default case, proceeds with logging
  6. Test framework panics: "Log in goroutine after test has completed"

Changes

  • Line 1841: recoveryCtx.Done()ctx.Done() in main select statement
  • Line 1846: recoveryCtx.Done()ctx.Done() after timer fires

Test Results

Ran TestUpdateListenAddressValidation 3 times locally - all passed without panic.

Impact

Fixes persistent test failures on macOS in workflow runs:

Generated with Claude Code

Complete the fix from PR #126 by using parent context consistently
throughout handleDockerUnavailable. This eliminates the race condition
where child context cancellation lags parent by nanoseconds.

Changes:
- Line 1841: Changed recoveryCtx.Done() -> ctx.Done() in main select
- Line 1846: Changed recoveryCtx.Done() -> ctx.Done() after timer fires

This ensures immediate shutdown detection when tests complete, preventing
"Log in goroutine after test has completed" panics.

Fixes the persistent test failure in TestUpdateListenAddressValidation
on macOS.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

📦 Build Artifacts for PR #127

Version: v0.9.24-dev-4c814ef-pr127
Latest tag: v0.9.24

Available Artifacts

  • frontend-dist-pr (177 KB)
  • archive-linux-amd64 (9.7 MB)
  • archive-linux-arm64 (8.8 MB)
  • archive-windows-amd64 (18 MB)
  • archive-windows-arm64 (16 MB)
  • archive-darwin-amd64 (18 MB)
  • installer-dmg-darwin-amd64 (20 MB)
  • archive-darwin-arm64 (16 MB)
  • installer-dmg-darwin-arm64 (18 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 19059587472 --repo smart-mcp-proxy/mcpproxy-go

# Or download a specific artifact:
gh run download 19059587472 --name dmg-darwin-arm64

Note: Artifacts expire in 14 days.

@Dumbris Dumbris merged commit d8b5744 into main Nov 4, 2025
20 checks passed
@Dumbris Dumbris deleted the fix/docker-recovery-timer-ctx branch November 4, 2025 06:42
Dumbris added a commit that referenced this pull request Nov 4, 2025
…ecks

Replace select-default pattern with ctx.Err() checks to eliminate TOCTOU
race condition in Docker recovery monitoring goroutines.

The select-default pattern creates a race window where context can be
cancelled AFTER the check passes but BEFORE logging happens, causing
"Log in goroutine after test has completed" panics.

Using ctx.Err() provides atomic context cancellation checking without
the race condition.

Changes:
- handleDockerUnavailable: Use ctx.Err() for all context checks before logging
- startDockerRecoveryMonitor: Use ctx.Err() in initial check and periodic monitoring
- Added ctx.Err() check before "Docker recovery successful" logging

Fixes the persistent test failure in TestUpdateListenAddressValidation
that survived PR #126 and PR #127.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Nov 4, 2025
…utdown

## Problem
After PR #129, tests were still failing with "panic: Log in goroutine after
test completed" on macOS. The `ctx.Err()` checks prevented the race between
checking context and logging, but goroutines could still be running when tests
completed because there was no synchronization mechanism.

## Root Cause
The Manager struct had `shutdownCtx` and `shutdownCancel` for signaling
shutdown, but **no `sync.WaitGroup` to track background goroutines**. When
`ShutdownAll()` or `DisconnectAll()` was called:
1. Context was cancelled
2. Function returned immediately
3. Background goroutines (OAuth monitor, Docker recovery) were still running
4. Test completed and marked `testing.T` as done
5. Goroutines tried to log → panic

## Solution
Added proper goroutine lifecycle management:

1. **Manager.shutdownWg** field - Tracks all background goroutines
2. **Goroutine launches** - Call `shutdownWg.Add(1)` before spawning:
   - `startOAuthEventMonitor` (in NewManager)
   - `startDockerRecoveryMonitor` (in NewManager and dynamic restart)
3. **Goroutine exits** - Call `defer shutdownWg.Done()` in both monitor functions
4. **Shutdown waiting** - Call `shutdownWg.Wait()` in:
   - `ShutdownAll()` - with 3-second timeout
   - `DisconnectAll()` - blocking wait

## Changes
- `Manager` struct: Added `shutdownWg sync.WaitGroup`
- `NewManager()`: Added `Add(1)` before launching monitors
- `startOAuthEventMonitor()`: Added `defer Done()` at start
- `startDockerRecoveryMonitor()`: Added `defer Done()` at start
- Dynamic recovery restart: Added `Add(1)` before `go startDockerRecoveryMonitor()`
- `ShutdownAll()`: Cancel context, wait for goroutines with timeout
- `DisconnectAll()`: Cancel context, wait for goroutines

## Testing
```bash
go test ./internal/runtime -run TestUpdateListenAddressValidation -v -race -count=5
# Result: 5/5 passed, no race conditions

go test ./internal/upstream ./internal/runtime -v -race -count=2
# Result: All tests passed
```

## Related
- Fixes persistent issue from PR #122, #126, #127, #129
- Complements the `ctx.Err()` checks added in PR #129

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants