refactor: replace deterministic supervisor lifecycle with AI-first decision engine by marcusquinn · Pull Request #2206 · marcusquinn/aidevops

marcusquinn · 2026-02-24T12:40:13Z

Summary

Remove ~1,000 lines of deterministic shell logic that prevented the supervisor from solving problems autonomously
Replace with AI-first architecture: GATHER (data) -> DECIDE (opus model) -> EXECUTE (action)
New interactive worker actions: resolve_conflicts, fix_ci, fix_and_push dispatch opus workers with full tool access to actually solve problems instead of just logging and waiting

What was removed

Component	Lines	Problem
`fast_path_decision()`	~110	Deterministic shortcuts that skipped AI for "obvious" cases — but couldn't handle edge cases
Phase 3b2 reconciliation	~200	Shell `case` statements trying to reconcile blocked/verify_failed tasks
Phase 3c issue sync	~50	Deterministic GitHub issue label sync
Phase 3d verified PR cleanup	~90	Deterministic merge/update for verified tasks with open PRs
Phase 3.5 rebase retry	~60	Retry counter with max cap — gave up instead of solving
Phase 3.6 escalation	~150	Hardcoded 10-step prompt that couldn't handle complex conflicts
Phase 4b2 stale pr_review	~50	Called legacy `cmd_pr_lifecycle` which has its own deterministic logic
`process_post_pr_lifecycle()`	~110	Legacy parallel lifecycle processor (now redirects to AI)

What replaced it

ai_decide() — a single function that sends the task's real-world state to opus and gets back a JSON action. The AI sees the same state a human would and picks the next step. No case statements, no fast-paths, no retry counters.

_dispatch_ai_worker() — for complex problems (conflicts, CI failures, unknown blockers), dispatches an interactive AI session with full tool access that can read code, understand context, and fix the actual problem.

Testing

ShellCheck clean on both files
Bash syntax validation passes
process_post_pr_lifecycle kept as backward-compatible redirect
extract_parent_id, adopt_untracked_prs, Phase 3b (verify queue) preserved

Summary by CodeRabbit

Release Notes

Refactor
- Restructured core task lifecycle architecture with centralized orchestration and decision-making processes
- Consolidated multiple workflow phases into a unified execution path
- Enhanced handling of complex operations and task dependencies through improved dispatch capabilities
- Expanded task state tracking with enriched metadata and comprehensive audit logging

…cision engine Remove ~1,000 lines of deterministic shell logic (fast_path_decision, Phase 3b2 reconciliation, Phase 3c/3d/3.5/3.6, process_post_pr_lifecycle) that prevented the supervisor from solving problems autonomously. New architecture: GATHER -> DECIDE -> EXECUTE - gather_task_state(): collects facts from DB, GitHub, git (pure data) - ai_decide(): sends state to opus model, gets JSON action back - execute_action(): runs what the AI decided; complex work dispatches interactive AI workers with full tool access Key changes: - AI model (opus) makes ALL lifecycle decisions, no deterministic shortcuts - New actions: resolve_conflicts, fix_ci, fix_and_push dispatch interactive AI workers that can read code, understand context, and fix problems - Removed SUPERVISOR_AI_LIFECYCLE toggle (always AI-first now) - Removed Phase 3b2-3.6 deterministic reconciliation/rebase/escalation - process_post_pr_lifecycle deprecated (redirects to process_ai_lifecycle) - Phase 4b2 stale pr_review and 4d stuck deploying simplified

coderabbitai · 2026-02-24T12:40:30Z

Walkthrough

The pull request restructures the supervisor's task lifecycle orchestration from deterministic branching with hard-coded gates to a fully AI-driven decision engine. It centralizes phase 3 flows into a unified process_ai_lifecycle function that gathers enriched task state, queries AI for next actions, and executes those actions in a loop, removing prior mechanical workflows.

Changes

Cohort / File(s)	Summary
AI Lifecycle Engine `.agents/scripts/supervisor/ai-lifecycle.sh`	Complete architectural overhaul replacing deterministic lifecycle gates with AI-driven orchestration. Functions renamed (`decide_next_action` → `ai_decide`, `execute_lifecycle_action` → `execute_action`); new `_dispatch_ai_worker` function for complex task delegation. Introduced enriched task state gathering (worker status, CI summary, PR metadata), structured AI decision prompts with explicit state/actions, and audit logging with decision trails. Updated model selection logic (`AI_LIFECYCLE_MODEL_TIER` → `AI_LIFECYCLE_MODEL`) and extended action execution to support worker dispatching and status-tag updates.
Pulse Phase Consolidation `.agents/scripts/supervisor/pulse.sh`	Simplified Phase 3 by eliminating deterministic sub-phases (3b2, 3c, 3d, 3.5, 3.6) and routing all task decisions exclusively to AI-driven `process_ai_lifecycle`. Converted `process_post_pr_lifecycle` to a thin redirect wrapper for backward compatibility. Streamlined Phase 4d recovery logic and collapsed legacy reconciliation branches that previously fed multi-phase workflows.

Sequence Diagram

sequenceDiagram
    participant Pulse as Pulse Orchestrator
    participant AI as AI Lifecycle Engine
    participant TaskState as Task State Gatherer
    participant AIModel as AI Decision Model
    participant ActionExec as Action Executor
    participant Worker as AI Worker Dispatcher
    
    Pulse->>AI: process_ai_lifecycle(tasks)
    loop For each task
        AI->>TaskState: gather_task_state(task_id)
        TaskState->>TaskState: enrich with worker status,<br/>CI summary, PR metadata
        TaskState-->>AI: structured task state
        AI->>AIModel: ai_decide(task_state, task_id)
        AIModel->>AIModel: format prompt with<br/>CURRENT STATE &<br/>AVAILABLE ACTIONS
        AIModel-->>AI: action, reason, status_tag (JSON)
        AI->>ActionExec: execute_action(task_id, action,<br/>reason, status_tag)
        alt action == deploy or merge or complex
            ActionExec->>Worker: _dispatch_ai_worker(task_id,<br/>action_type, repo, worktree)
            Worker-->>ActionExec: worker PID & health tracking
        else action == simple (wait, cancel, retry)
            ActionExec->>ActionExec: execute inline
        end
        ActionExec->>ActionExec: update TODO.md status tags,<br/>commit changes
        ActionExec-->>AI: action result
    end
    AI-->>Pulse: all tasks processed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

refactor: AI-driven task lifecycle engine replacing hardcoded bash heuristics #2107: Implements the same AI-driven supervisor lifecycle overhaul with identical restructuring of decision and execution patterns.
feat: Supervisor post-PR lifecycle (t128.8) #392: Modifies supervisor post-PR lifecycle and Phase 3 pulse handling with related process_post_pr_lifecycle and PR-state orchestration logic.
feat: add fix_ci action to supervisor AI lifecycle for automatic CI repair #2144: Replaces mechanical CI-fix workflows with AI-driven dispatch patterns; parallels the transition from fix_ci_failures to _dispatch_ai_worker.

Suggested labels

needs-review

Poem

🤖 From rigid gates to flowing streams,

AI now orchestrates the dreams—

Gather state, ask Claude to lead,

Execute actions at machine speed! 🚀

No more hard-coded paths confined,

Just prompts and workers, intertwined. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: replacing a deterministic supervisor lifecycle with an AI-first decision engine, which aligns with the substantial refactoring of supervisor orchestration in both ai-lifecycle.sh and pulse.sh.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/supervisor-ai-first

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-24T12:40:34Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fundamentally rearchitects the supervisor's task lifecycle management by shifting from a rule-based, deterministic system to an AI-driven decision engine. The change aims to enhance the supervisor's autonomy and problem-solving capabilities, allowing it to dynamically respond to various task states and issues without relying on hardcoded heuristics. This refactor streamlines the codebase and empowers the system to handle complex scenarios more intelligently.

Highlights

AI-First Decision Engine: Replaced approximately 1,000 lines of deterministic shell logic with an AI-first architecture for the supervisor lifecycle, enabling autonomous problem-solving.
Interactive AI Worker Actions: Introduced new interactive worker actions such as 'resolve_conflicts', 'fix_ci', and 'fix_and_push' that dispatch AI workers with full tool access to address complex issues.
Consolidated Lifecycle Logic: Consolidated various previously deterministic phases (e.g., fast-path decisions, reconciliation, rebase retries, escalation) into a single AI-driven decision process.

Changelog

.agents/scripts/supervisor/ai-lifecycle.sh
- Updated file header comments to reflect the new AI-first architecture and its GATHER → DECIDE → EXECUTE flow.
- Removed AI_LIFECYCLE_DECISION_TIMEOUT and AI_LIFECYCLE_MODEL_TIER variables, replacing them with AI_LIFECYCLE_MODEL and AI_LIFECYCLE_TIMEOUT.
- Modified gather_task_state to include worker_pid in the database query and WORKER_ALIVE in the output, and simplified CI status reporting.
- Renamed decide_next_action to ai_decide and significantly updated its prompt with new available actions and decision rules, removing deterministic shortcuts.
- Added logging for AI decisions to an audit trail directory.
- Renamed execute_lifecycle_action to execute_action and refactored its logic to dispatch interactive AI workers for complex actions like resolve_conflicts, fix_ci, and fix_and_push.
- Removed the fix_ci_failures function, as its functionality is now handled by the generic _dispatch_ai_worker.
- Removed process_task_lifecycle and fast_path_decision functions, as their logic is integrated into the new AI-first flow.
- Re-added update_task_status_tag and commit_status_tag_updates functions at the end of the file for backward compatibility with other modules.
- Updated process_ai_lifecycle to query all task states, remove fast-path decisions, and utilize the new ai_decide and execute_action functions.
.agents/scripts/supervisor/pulse.sh
- Modified Phase 3 to exclusively use process_ai_lifecycle, removing the conditional fallback to process_post_pr_lifecycle.
- Removed or simplified logic for Phase 3b2 reconciliation, Phase 3c issue sync, Phase 3d verified PR cleanup, Phase 3.5 rebase retry, and Phase 3.6 escalation, noting that these are now handled by the AI lifecycle.
- Simplified Phase 4d's stuck deploying recovery logic to directly force tasks to 'deployed' after a timeout.
- Changed the process_post_pr_lifecycle function to a thin redirect that calls process_ai_lifecycle for backward compatibility.

Activity

The pull request author, marcusquinn, initiated this significant refactor to transition the supervisor's lifecycle management to an AI-first approach.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-24T12:40:48Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 43 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Tue Feb 24 12:40:43 UTC 2026: Code review monitoring started
Tue Feb 24 12:40:44 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 43

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 43
VULNERABILITIES: 0

Generated on: Tue Feb 24 12:40:47 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-24T12:41:35Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

This is an impressive and significant refactoring that replaces a complex, deterministic shell-based lifecycle engine with a much cleaner and more powerful AI-first architecture. The GATHER -> DECIDE -> EXECUTE pattern is well-implemented, and delegating all decisions to the AI model while using shell scripts for data gathering and execution is a solid design. The introduction of specialized AI workers for complex tasks like conflict resolution and CI fixes is a major step forward in autonomous problem-solving. My review comments focus on improving the robustness and debuggability of the new implementation by adhering to the repository's general rules, specifically around not suppressing stderr for critical command-line tools. This will ensure that any underlying issues with gh, jq, or the AI CLIs are visible in the logs, which is crucial for maintaining such a system.

gemini-code-assist · 2026-02-24T12:44:19Z

.agents/scripts/supervisor/ai-lifecycle.sh

 	if [[ "$ai_cli" == "opencode" ]]; then
-		ai_result=$(portable_timeout "$ai_timeout" opencode run \
+		ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" opencode run \
 			-m "$ai_model" \
 			--format default \
-			--title "lifecycle-decision-$$" \
+			--title "lifecycle-${task_id}-$$" \
 			"$prompt" 2>/dev/null || echo "")
 		# Strip ANSI codes
 		ai_result=$(printf '%s' "$ai_result" | sed 's/\x1b\[[0-9;]*[mGKHF]//g; s/\x1b\[[0-9;]*[A-Za-z]//g; s/\x1b\]//g; s/\x07//g')
 	else
 		local claude_model="${ai_model#*/}"
-		ai_result=$(portable_timeout "$ai_timeout" claude \
+		ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" claude \
 			-p "$prompt" \
 			--model "$claude_model" \
 			--output-format text 2>/dev/null || echo "")
 	fi


The calls to the AI CLIs (opencode and claude) are suppressing stderr using 2>/dev/null. This is highly risky as it will hide critical errors such as authentication failures (e.g., missing API keys), network issues, or problems with the model endpoint itself. According to the repository's general rules, blanket error suppression should be avoided to maintain debuggability. If the AI decision engine fails silently, the entire lifecycle process will be compromised. Please remove the 2>/dev/null to ensure any errors from the AI CLIs are logged.

Suggested change

if [[ "$ai_cli" == "opencode" ]]; then

ai_result=$(portable_timeout "$ai_timeout" opencode run \

ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" opencode run \

-m "$ai_model" \

--format default \

--title "lifecycle-decision-$$" \

--title "lifecycle-${task_id}-$$" \

"$prompt" 2>/dev/null || echo "")

# Strip ANSI codes

ai_result=$(printf '%s' "$ai_result" | sed 's/\x1b\[[0-9;]*[mGKHF]//g; s/\x1b\[[0-9;]*[A-Za-z]//g; s/\x1b\]//g; s/\x07//g')

else

local claude_model="${ai_model#*/}"

ai_result=$(portable_timeout "$ai_timeout" claude \

ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" claude \

-p "$prompt" \

--model "$claude_model" \

--output-format text 2>/dev/null || echo "")

fi

if [[ "$ai_cli" == "opencode" ]]; then

ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" opencode run \

-m "$ai_model" \

--format default \

--title "lifecycle-${task_id}-$$" \

"$prompt" || echo "")

# Strip ANSI codes

ai_result=$(printf '%s' "$ai_result" | sed 's/\x1b\[[0-9;]*[mGKHF]//g; s/\x1b\[[0-9;]*[A-Za-z]//g; s/\x1b\]//g; s/\x07//g')

else

local claude_model="${ai_model#*/}"

ai_result=$(portable_timeout "$AI_LIFECYCLE_TIMEOUT" claude \

-p "$prompt" \

--model "$claude_model" \

--output-format text || echo "")

fi

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-02-24T12:44:20Z

.agents/scripts/supervisor/ai-lifecycle.sh

 			base_ref=$(gh pr view "$pr_number" --repo "$pr_repo_slug" \
 				--json baseRefName --jq '.baseRefName' 2>/dev/null) || base_ref=""


The gh pr view command is suppressing stderr with 2>/dev/null. This can hide important errors related to authentication, network connectivity, or the PR not being found. Per the repository's general rules, stderr should not be suppressed for such commands to aid in debugging. Please remove the 2>/dev/null.

Suggested change

base_ref=$(gh pr view "$pr_number" --repo "$pr_repo_slug" \

--json baseRefName --jq '.baseRefName' 2>/dev/null) || base_ref=""

base_ref=$(gh pr view "$pr_number" --repo "$pr_repo_slug" \

--json baseRefName --jq '.baseRefName') || base_ref=""

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-02-24T12:44:20Z

.agents/scripts/supervisor/ai-lifecycle.sh

+					base_ref=$(gh pr view "$tpr" --repo "$(detect_repo_slug "$trepo" 2>/dev/null || echo "")" \
+						--json baseRefName --jq '.baseRefName' 2>/dev/null) || base_ref="main"


The gh pr view command here suppresses stderr via 2>/dev/null. This is problematic as it can hide errors from the gh CLI, such as authentication failures or if the PR URL is invalid. The repository's general rules advise against suppressing stderr for such commands to ensure errors are visible for debugging. Please remove 2>/dev/null.

Suggested change

base_ref=$(gh pr view "$tpr" --repo "$(detect_repo_slug "$trepo" 2>/dev/null || echo "")" \

--json baseRefName --jq '.baseRefName' 2>/dev/null) || base_ref="main"

base_ref=$(gh pr view "$tpr" --repo "$(detect_repo_slug "$trepo" 2>/dev/null || echo "")" \

--json baseRefName --jq '.baseRefName') || base_ref="main"

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-02-24T12:44:20Z

.agents/scripts/supervisor/ai-lifecycle.sh

 					pr_state=$(printf '%s' "$pr_json" | jq -r '.state // "UNKNOWN"' 2>/dev/null || echo "UNKNOWN")
 					pr_merge_state=$(printf '%s' "$pr_json" | jq -r '.mergeStateStatus // "UNKNOWN"' 2>/dev/null || echo "UNKNOWN")
+					pr_review_decision=$(printf '%s' "$pr_json" | jq -r '.reviewDecision // "NONE"' 2>/dev/null || echo "NONE")
+					pr_base_ref=$(printf '%s' "$pr_json" | jq -r '.baseRefName // "main"' 2>/dev/null || echo "main")

-					# Retry once if UNKNOWN (GitHub lazy-loads mergeStateStatus)
-					if [[ "$pr_merge_state" == "UNKNOWN" ]]; then
-						sleep 2
-						local retry_json
-						retry_json=$(gh pr view "$pr_number" --repo "$pr_repo_slug" \
-							--json mergeable,mergeStateStatus 2>/dev/null || echo "")
-						if [[ -n "$retry_json" ]]; then
-							pr_merge_state=$(printf '%s' "$retry_json" | jq -r '.mergeStateStatus // "UNKNOWN"' 2>/dev/null || echo "UNKNOWN")
-						fi
+					local is_draft
+					is_draft=$(printf '%s' "$pr_json" | jq -r '.isDraft // false' 2>/dev/null || echo "false")
+					if [[ "$is_draft" == "true" ]]; then
+						pr_state="DRAFT"
 					fi

-					pr_review_decision=$(printf '%s' "$pr_json" | jq -r '.reviewDecision // "NONE"' 2>/dev/null || echo "NONE")
-
-					# Summarize CI status
+					# CI summary
 					local check_rollup
 					check_rollup=$(printf '%s' "$pr_json" | jq -r '.statusCheckRollup // []' 2>/dev/null || echo "[]")
 					if [[ "$check_rollup" != "[]" && "$check_rollup" != "null" ]]; then
-						local pending failed passed
+						local pending failed passed total
 						pending=$(printf '%s' "$check_rollup" | jq '[.[] | select(.status == "IN_PROGRESS" or .status == "QUEUED" or .status == "PENDING")] | length' 2>/dev/null || echo "0")
 						failed=$(printf '%s' "$check_rollup" | jq '[.[] | select((.conclusion | test("FAILURE|TIMED_OUT|ACTION_REQUIRED")) or .state == "FAILURE" or .state == "ERROR")] | length' 2>/dev/null || echo "0")
 						passed=$(printf '%s' "$check_rollup" | jq '[.[] | select(.conclusion == "SUCCESS" or .state == "SUCCESS")] | length' 2>/dev/null || echo "0")
-						pr_ci_status="passed:${passed} failed:${failed} pending:${pending}"
-
-						# Extract names of failed checks for fix_ci routing
-						local failed_check_names
-						failed_check_names=$(printf '%s' "$check_rollup" | jq -r '[.[] | select((.conclusion | test("FAILURE|TIMED_OUT|ACTION_REQUIRED")) or .state == "FAILURE" or .state == "ERROR") | .name] | join(",")' 2>/dev/null || echo "")
-						if [[ -n "$failed_check_names" ]]; then
-							pr_ci_failed_checks="$failed_check_names"
+						total=$(printf '%s' "$check_rollup" | jq 'length' 2>/dev/null || echo "0")
+						pr_ci_summary="total:${total} passed:${passed} failed:${failed} pending:${pending}"
+
+						# Names of failed checks
+						local failed_names
+						failed_names=$(printf '%s' "$check_rollup" | jq -r '[.[] | select((.conclusion | test("FAILURE|TIMED_OUT|ACTION_REQUIRED")) or .state == "FAILURE" or .state == "ERROR") | .name] | join(", ")' 2>/dev/null || echo "")
+						if [[ -n "$failed_names" ]]; then
+							pr_ci_failed_names="$failed_names"
 						fi


Throughout this block, jq is called with 2>/dev/null, which suppresses standard error. According to the repository's general rules, stderr should not be suppressed for commands like jq to ensure that syntax or system errors are visible for debugging. While the || echo ... provides a fallback, hiding the actual error from jq makes it harder to diagnose issues with the JSON processing logic or malformed input from the gh command.

Please remove 2>/dev/null from these jq calls.

pr_state=$(printf '%s' "$pr_json" | jq -r '.state // "UNKNOWN"' || echo "UNKNOWN") pr_merge_state=$(printf '%s' "$pr_json" | jq -r '.mergeStateStatus // "UNKNOWN"' || echo "UNKNOWN") pr_review_decision=$(printf '%s' "$pr_json" | jq -r '.reviewDecision // "NONE"' || echo "NONE") pr_base_ref=$(printf '%s' "$pr_json" | jq -r '.baseRefName // "main"' || echo "main") local is_draft is_draft=$(printf '%s' "$pr_json" | jq -r '.isDraft // false' || echo "false") if [[ "$is_draft" == "true" ]]; then pr_state="DRAFT" fi # CI summary local check_rollup check_rollup=$(printf '%s' "$pr_json" | jq -r '.statusCheckRollup // []' || echo "[]") if [[ "$check_rollup" != "[]" && "$check_rollup" != "null" ]]; then local pending failed passed total pending=$(printf '%s' "$check_rollup" | jq '[.[] | select(.status == "IN_PROGRESS" or .status == "QUEUED" or .status == "PENDING")] | length' || echo "0") failed=$(printf '%s' "$check_rollup" | jq '[.[] | select((.conclusion | test("FAILURE|TIMED_OUT|ACTION_REQUIRED")) or .state == "FAILURE" or .state == "ERROR")] | length' || echo "0") passed=$(printf '%s' "$check_rollup" | jq '[.[] | select(.conclusion == "SUCCESS" or .state == "SUCCESS")] | length' || echo "0") total=$(printf '%s' "$check_rollup" | jq 'length' || echo "0") pr_ci_summary="total:${total} passed:${passed} failed:${failed} pending:${pending}" # Names of failed checks local failed_names failed_names=$(printf '%s' "$check_rollup" | jq -r '[.[] | select((.conclusion | test("FAILURE|TIMED_OUT|ACTION_REQUIRED")) or .state == "FAILURE" or .state == "ERROR") | .name] | join(", ")' || echo "") if [[ -n "$failed_names" ]]; then pr_ci_failed_names="$failed_names" fi

References

In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

gemini-code-assist · 2026-02-24T12:44:20Z

.agents/scripts/supervisor/ai-lifecycle.sh


-	# Validate required fields
 	local action
 	action=$(printf '%s' "$json_block" | jq -r '.action // ""' 2>/dev/null || echo "")


This jq call suppresses stderr using 2>/dev/null, which violates the repository's general rule about not hiding errors from commands. If the JSON block is malformed, this will fail silently, making it harder to debug why an action wasn't parsed correctly. Please remove 2>/dev/null to allow potential jq errors to be logged.

Suggested change

action=$(printf '%s' "$json_block" | jq -r '.action // ""' 2>/dev/null || echo "")

action=$(printf '%s' "$json_block" | jq -r '.action // ""' || echo "")

References

In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/ai-lifecycle.sh:
- Around line 397-406: The rebase_branch case only increments rebase_attempts on
success; move or duplicate the increment logic so rebase_attempts is increased
on every attempt (success or failure). Specifically, ensure the db update that
reads current_attempts and writes rebase_attempts = $((current_attempts + 1))
(the commands using current_attempts, db "$SUPERVISOR_DB" "SELECT ...", and db
"$SUPERVISOR_DB" "UPDATE ...") runs regardless of whether rebase_sibling_pr
succeeds — e.g., perform the SELECT/UPDATE immediately after calling
rebase_sibling_pr (or in a finally-style block) before returning so the counter
always increments.
- Around line 122-131: The current check uses kill -0 on tpid which can be in
remote:host:pid form and will incorrectly report remote workers as dead; update
the block that sets worker_alive to detect remote PID formats (check if tpid
contains ':' or matches a remote pattern) before calling kill -0. If tpid looks
remote (e.g., contains two colons or matches remote:host:pid), set worker_alive
to a remote indicator like "remote (tpid)" or extract the real PID after the
last ':' and handle accordingly; only run kill -0 when tpid is a plain numeric
PID. Ensure changes are applied around the tpid / worker_alive logic that
currently does kill -0.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f414143 and 09c3f22.

📒 Files selected for processing (2)

.agents/scripts/supervisor/ai-lifecycle.sh
.agents/scripts/supervisor/pulse.sh

coderabbitai · 2026-02-24T12:45:46Z

.agents/scripts/supervisor/ai-lifecycle.sh

+	# Worker process state
+	local worker_alive="unknown"
+	if [[ -n "$tpid" && "$tpid" != "0" ]]; then
+		if kill -0 "$tpid" 2>/dev/null; then
+			worker_alive="yes"
+		else
+			worker_alive="no (PID $tpid dead)"
+		fi
+	else
+		worker_alive="no worker"


⚠️ Potential issue | 🟠 Major

Handle remote worker PID formats to avoid premature AI actions.

worker_pid can be stored as remote:host:pid (see pulse.sh remote dispatch handling). kill -0 will mark those as dead and the AI may take corrective actions while the remote worker is still running.

🛠️ Suggested fix to detect remote workers

local worker_alive="unknown" if [[ -n "$tpid" && "$tpid" != "0" ]]; then - if kill -0 "$tpid" 2>/dev/null; then - worker_alive="yes" - else - worker_alive="no (PID $tpid dead)" - fi + if [[ "$tpid" == remote:* ]]; then + local _remote_host _remote_pid remote_helper + _remote_host=$(printf '%s' "$tpid" | cut -d: -f2) + _remote_pid=$(printf '%s' "$tpid" | cut -d: -f3) + remote_helper="${SCRIPT_DIR}/../remote-dispatch-helper.sh" + if [[ -x "$remote_helper" ]] && "$remote_helper" status "$task_id" "$_remote_host" >/dev/null 2>&1; then + worker_alive="yes (remote)" + else + worker_alive="no (remote worker dead)" + fi + elif kill -0 "$tpid" 2>/dev/null; then + worker_alive="yes" + else + worker_alive="no (PID $tpid dead)" + fi else worker_alive="no worker" fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/ai-lifecycle.sh around lines 122 - 131, The current check uses kill -0 on tpid which can be in remote:host:pid form and will incorrectly report remote workers as dead; update the block that sets worker_alive to detect remote PID formats (check if tpid contains ':' or matches a remote pattern) before calling kill -0. If tpid looks remote (e.g., contains two colons or matches remote:host:pid), set worker_alive to a remote indicator like "remote (tpid)" or extract the real PID after the last ':' and handle accordingly; only run kill -0 when tpid is a plain numeric PID. Ensure changes are applied around the tpid / worker_alive logic that currently does kill -0.

coderabbitai · 2026-02-24T12:45:46Z

.agents/scripts/supervisor/ai-lifecycle.sh

 	rebase_branch)
-		log_info "ai-lifecycle: rebasing branch for $task_id"
-		update_task_status_tag "$task_id" "rebasing" "$repo_path"
-
 		if rebase_sibling_pr "$task_id" 2>>"$SUPERVISOR_LOG"; then
 			log_success "ai-lifecycle: rebase succeeded for $task_id"
-			update_task_status_tag "$task_id" "ci-running" "$repo_path"
-			# Increment rebase counter
 			local current_attempts
 			current_attempts=$(db "$SUPERVISOR_DB" "SELECT rebase_attempts FROM tasks WHERE id = '$escaped_id';" 2>/dev/null || echo "0")
 			db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((current_attempts + 1)) WHERE id = '$escaped_id';" 2>/dev/null || true
 			return 0
-		else
-			log_warn "ai-lifecycle: rebase failed for $task_id"
-			update_task_status_tag "$task_id" "has-conflicts" "$repo_path"
-			return 1
 		fi
+		log_warn "ai-lifecycle: rebase failed for $task_id"
+		return 1


⚠️ Potential issue | 🟠 Major

Increment rebase_attempts on every try, not just success.

Right now failures don’t increment, so the “rebase_attempts > 3 → resolve_conflicts” guard may never trigger, leading to infinite rebase loops.

🛠️ Suggested fix to count every attempt

rebase_branch) - if rebase_sibling_pr "$task_id" 2>>"$SUPERVISOR_LOG"; then - log_success "ai-lifecycle: rebase succeeded for $task_id" - local current_attempts - current_attempts=$(db "$SUPERVISOR_DB" "SELECT rebase_attempts FROM tasks WHERE id = '$escaped_id';" 2>/dev/null || echo "0") - db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((current_attempts + 1)) WHERE id = '$escaped_id';" 2>/dev/null || true - return 0 - fi + local current_attempts + current_attempts=$(db "$SUPERVISOR_DB" "SELECT rebase_attempts FROM tasks WHERE id = '$escaped_id';" 2>/dev/null || echo "0") + db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((current_attempts + 1)) WHERE id = '$escaped_id';" 2>/dev/null || true + if rebase_sibling_pr "$task_id" 2>>"$SUPERVISOR_LOG"; then + log_success "ai-lifecycle: rebase succeeded for $task_id" + return 0 + fi log_warn "ai-lifecycle: rebase failed for $task_id" return 1 ;;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rebase_branch)

log_info "ai-lifecycle: rebasing branch for $task_id"

update_task_status_tag "$task_id" "rebasing" "$repo_path"

if rebase_sibling_pr "$task_id" 2>>"$SUPERVISOR_LOG"; then

log_success "ai-lifecycle: rebase succeeded for $task_id"

update_task_status_tag "$task_id" "ci-running" "$repo_path"

# Increment rebase counter

local current_attempts

current_attempts=$(db "$SUPERVISOR_DB" "SELECT rebase_attempts FROM tasks WHERE id = '$escaped_id';" 2>/dev/null || echo "0")

db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((current_attempts + 1)) WHERE id = '$escaped_id';" 2>/dev/null || true

return 0

else

log_warn "ai-lifecycle: rebase failed for $task_id"

update_task_status_tag "$task_id" "has-conflicts" "$repo_path"

return 1

fi

log_warn "ai-lifecycle: rebase failed for $task_id"

return 1

rebase_branch)

local current_attempts

current_attempts=$(db "$SUPERVISOR_DB" "SELECT rebase_attempts FROM tasks WHERE id = '$escaped_id';" 2>/dev/null || echo "0")

db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((current_attempts + 1)) WHERE id = '$escaped_id';" 2>/dev/null || true

if rebase_sibling_pr "$task_id" 2>>"$SUPERVISOR_LOG"; then

log_success "ai-lifecycle: rebase succeeded for $task_id"

return 0

fi

log_warn "ai-lifecycle: rebase failed for $task_id"

return 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/ai-lifecycle.sh around lines 397 - 406, The rebase_branch case only increments rebase_attempts on success; move or duplicate the increment logic so rebase_attempts is increased on every attempt (success or failure). Specifically, ensure the db update that reads current_attempts and writes rebase_attempts = $((current_attempts + 1)) (the commands using current_attempts, db "$SUPERVISOR_DB" "SELECT ...", and db "$SUPERVISOR_DB" "UPDATE ...") runs regardless of whether rebase_sibling_pr succeeds — e.g., perform the SELECT/UPDATE immediately after calling rebase_sibling_pr (or in a finally-style block) before returning so the counter always increments.

gemini-code-assist bot reviewed Feb 24, 2026

View reviewed changes

coderabbitai bot requested changes Feb 24, 2026

View reviewed changes

marcusquinn merged commit dee2573 into main Feb 24, 2026
26 of 27 checks passed

marcusquinn deleted the refactor/supervisor-ai-first branch February 24, 2026 12:50

github-actions bot mentioned this pull request Feb 24, 2026

t1311: Evaluate oh-my-pi swarm DAG patterns for supervisor dispatch #2135

Closed

13 tasks

		base_ref=$(gh pr view "$pr_number" --repo "$pr_repo_slug" \
		--json baseRefName --jq '.baseRefName' 2>/dev/null) \|\| base_ref=""

		base_ref=$(gh pr view "$tpr" --repo "$(detect_repo_slug "$trepo" 2>/dev/null \|\| echo "")" \
		--json baseRefName --jq '.baseRefName' 2>/dev/null) \|\| base_ref="main"

	action=$(printf '%s' "$json_block" \| jq -r '.action // ""' 2>/dev/null \|\| echo "")
	action=$(printf '%s' "$json_block" \| jq -r '.action // ""' \|\| echo "")

Conversation

marcusquinn commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was removed

What replaced it

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 24, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 24, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading