fix: supervisor integration testing fixes (t128.7)#384
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Caution Review failedThe pull request is closed. WalkthroughAdds emergency worker-process cleanup, model selection and health-checks, and threads model hints through dispatch/pulse/evaluation flows; introduces new helper functions and a Changes
Sequence DiagramsequenceDiagram
participant Client as Client/CLI
participant Supervisor as supervisor-helper.sh
participant Dispatcher as Dispatch Flow
participant Pulse as Pulse Phase
participant Evaluator as evaluate_worker()
participant LogParser as extract_log_metadata()
participant HealthSvc as check_model_health()
participant Worker as Worker Process Tree
Client->>Supervisor: cmd_dispatch / cmd_pulse / kill-workers
Supervisor->>HealthSvc: resolve_model() & check_model_health(model)
HealthSvc-->>Supervisor: model health (ok|unhealthy)
Supervisor->>Dispatcher: build_dispatch_cmd(model) -> start Worker
Dispatcher->>Worker: spawn PID
Pulse->>Evaluator: evaluate_worker(PID, model)
Evaluator->>LogParser: tail logs (last 20 lines)
LogParser-->>Evaluator: metadata + backend_error_count + exit_code
alt backend infra or retryable
Evaluator->>Supervisor: decide RETRY
Supervisor->>Worker: cleanup_worker_processes(PID)
Supervisor->>Worker: _list_descendants / _kill_descendants
else deterministic complete
Evaluator-->>Pulse: TASK_COMPLETE
else unrecoverable
Evaluator-->>Pulse: TASK_FAILED
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 05:59:31 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.agent/scripts/supervisor-helper.sh (1)
3446-3464:⚠️ Potential issue | 🟡 Minor
show_usagemissingkill-workerscommand and updatedcleanupdescription.The file header (lines 16-17) documents both
cleanup ... + processesandkill-workers, butshow_usage()still has the old cleanup description and omitskill-workersentirely.📝 Proposed fix
- supervisor-helper.sh cleanup [--dry-run] Clean up completed worktrees + supervisor-helper.sh cleanup [--dry-run] Clean up completed worktrees + processes + supervisor-helper.sh kill-workers [--dry-run] Kill orphaned worker processes (emergency)Also add an
Options for 'kill-workers'section lower in the help text:Options for 'kill-workers': --dry-run Show orphaned processes without killing themAs per coding guidelines,
.agent/scripts/*.sh: "Clear logging and feedback" — the help output is the primary discoverability mechanism for the new emergency command.
🤖 Fix all issues with AI agents
In @.agent/scripts/supervisor-helper.sh:
- Around line 1699-1710: The script deletes "$log_tail_file" before computing
backend_error_count, so the grep against "$log_tail_file" always fails and
backend_error_count is 0; fix by moving the rm -f "$log_tail_file" to after the
backend_error_count assignment (or add an existence check such as [ -f
"$log_tail_file" ] before grepping) so the grep command that sets
backend_error_count (the line using grep -ci 'endpoints
failed\|Antigravity\|gateway.*error\|service unavailable\|503' "$log_tail_file"
...) runs against the actual file and does not get silenced by 2>/dev/null ||
echo 0. Ensure the cleanup removal of "$log_tail_file" happens only after all
counts (rate_limit_count, auth_error_count, conflict_count, timeout_count,
oom_count, backend_error_count) are computed.
🧹 Nitpick comments (2)
.agent/scripts/supervisor-helper.sh (2)
1258-1271: Descendants only receive SIGTERM — no SIGKILL fallback.
cleanup_worker_processesescalates tokill -9for the parent PID, but_kill_descendantsonly sends SIGTERM. A stubborn node/opencode child process that traps or ignores SIGTERM will survive cleanup — exactly the orphan scenario this PR aims to eliminate.♻️ Add SIGKILL fallback for descendants
_kill_descendants() { local parent_pid="$1" local children children=$(pgrep -P "$parent_pid" 2>/dev/null) || true if [[ -n "$children" ]]; then for child in $children; do _kill_descendants "$child" kill "$child" 2>/dev/null || true done + # Brief wait then force-kill any survivors + sleep 0.5 + for child in $children; do + kill -0 "$child" 2>/dev/null && kill -9 "$child" 2>/dev/null || true + done fi return 0 }
1688-1690: Consider a trap for temp file cleanup inextract_log_metadata.If the function exits abnormally between
mktempandrm -f, the temp file leaks. Underset -e, an unguarded failure could trigger this. A local trap would make this bulletproof.♻️ Trap-based temp file cleanup
local log_tail_file log_tail_file=$(mktemp) + trap "rm -f '$log_tail_file'" RETURN tail -20 "$log_file" > "$log_tail_file" 2>/dev/null || trueWith this, the
rm -f "$log_tail_file"on line 1699 can be removed (the trap handles it), which also naturally fixes the ordering bug withbackend_error_count.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:08:31 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:12:21 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:17:28 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:18:33 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:31:18 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 06:38:55 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
…racy - Replace associative arrays (bash 4+) with grep-based metadata parsing - Move dispatched_count declaration before Phase 1 (unbound variable fix) - Add cleanup_worker_processes() to kill process trees on task completion - Add kill-workers command for emergency orphan cleanup - Add _list_descendants() for safe process tree protection - Integrate process cleanup into pulse evaluation and cleanup command - Prioritize PR URL over heuristic error patterns (prevents false positives) - Gate heuristic error matching on non-zero exit code only - Reduce error pattern search to last 20 log lines Found during t128.7 integration testing: 80 orphaned opencode processes accumulated from workers whose parent shells exited but child processes (opencode binary, node MCP servers) kept running as PPID=1 orphans. Heuristic evaluation produced false positives when task content discussed authentication/errors (e.g., Bing Webmaster Tools API documentation).
Tasks could get stuck in 'evaluating' state when the AI eval (Tier 3) timed out. The pulse now includes 'evaluating' in its Phase 1 query and re-evaluates stuck tasks with --no-ai to avoid repeated timeouts.
Add backend_error_count to extract_log_metadata() for detecting Antigravity/API gateway failures. These are always transient (provider overloaded) and should retry, not block. Found during t128.7 when concurrency 4 overwhelmed the opencode backend.
- Task notifications: sound + banner on complete (Glass), blocked (Basso), failed (Sosumi) - Batch progress: notification on each pulse with progress (Hero sound on all-complete) - Integrated into send_task_notification() and pulse summary - Only fires on Darwin (macOS), no-op on Linux
- afplay needs direct backgrounding (not subshell) to produce sound - Backend error grep was reading deleted temp file; now searches full log - say for spoken status, afplay for attention sounds
Add Quota protection, over.*usage, quota reset patterns to backend error detection. Opencode rotates providers (Antigravity, Gemini) and all can hit quota limits during high-concurrency batch dispatch.
- Add check_model_health() that probes the model with a trivial prompt before creating worktrees or burning retries (15s timeout, 5m cache) - Pass task model to build_dispatch_cmd() so opencode uses -m flag (prevents Antigravity router from picking exhausted providers) - Return code 3 from dispatch = provider unavailable, stops pulse dispatch loop until next cycle (avoids cascading failures) - Detects: quota exhaustion, endpoint failures, timeouts, empty responses
- Add resolve_model() with tiers: coding (opus), eval (sonnet), health (sonnet)
- Health check uses cheap sonnet probe instead of opus
- AI eval uses resolve_model('eval') instead of hardcoded model
- Pulse Phase 4: periodic cleanup of stale worker processes every cycle
- Model priority: Anthropic SOTA via opencode > claude CLI > zen free
A task can hit a backend error early, recover, and complete with a PR. Backend error check was running first, causing false retries on completed tasks. Now ordered: signals > PR URL > backend errors.
Background say/afplay were getting killed when the calling function returned. nohup detaches them from the process group.
…rms) say/TTS requires the speech synthesis daemon which macOS Sequoia won't start for processes without Accessibility permissions. afplay works reliably for all process contexts. TTS can be re-enabled once Tabby is added to System Settings > Privacy & Security > Accessibility.
027ec21 to
da1193c
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 16:49:39 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
kill-workerscommand for emergency orphan cleanupIntegration Test Results: 12/12 COMPLETE
Bugs Fixed (10)
local -Aassociative arrays with grep-based parsingdispatched_countreferenced before declaration in retry handlerevaluatingstate, re-evaluates without AI on timeoutsayneeds Accessibility perms; simplified toafplaysystem soundsNew Features
cleanup_worker_processes()- kills process trees on task completion_kill_descendants()/_list_descendants()- recursive process tree managementkill-workerscommand - emergency orphan cleanup with safety guardscheck_model_health()- pre-dispatch probe with 5-minute cacheresolve_model()- tiered model selection (coding=opus, eval=sonnet, health=sonnet)notify_batch_progress()- macOS audio alerts on progress milestones-mflag to opencode (prevents router picking exhausted providers)Follow-up Tasks Created
Closes t128.7
Summary by CodeRabbit
Release Notes
New Features
Improvements