t1301: fix rc=143 SIGTERM kills during AI reasoning phase#2120
t1301: fix rc=143 SIGTERM kills during AI reasoning phase#2120marcusquinn merged 2 commits intomainfrom
Conversation
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue where AI reasoning processes were being prematurely terminated by the supervisor due to conflicts arising from concurrent pulse executions. By adjusting lock and AI timeouts and introducing a PID file-based protection mechanism, the changes ensure that AI operations can complete reliably without interruption, significantly improving the stability of AI-driven tasks. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis PR addresses rc=143 supervisor pipeline kills during concurrent AI reasoning by extending timeout values from 600s to 1800s and implementing AI session PID protection in Phase 14 and Phase 4e cleanup logic. Changes prevent Phase 4e cleanup from prematurely terminating AI-spawned processes during overlapping pulse cycles. Changes
Sequence DiagramsequenceDiagram
actor Pulse as Pulse Cycle
participant Phase14 as Phase 14: AI Supervisor
participant PID as PID Registry
participant AI as AI Pipeline
participant Phase4e as Phase 4e: Cleanup
Pulse->>Phase14: Start AI supervision
Phase14->>PID: Write ai-supervisor.pid
Phase14->>AI: Launch AI pipeline
AI->>AI: Execute reasoning/prompt
Note over Pulse,Phase4e: Concurrent pulse arrives
Phase4e->>PID: Check protection (ai-supervisor.pid exists)
Phase4e->>Phase4e: Skip AI supervisor & descendants
Phase4e->>PID: Remove other stale PIDs
AI->>AI: Complete work
Phase14->>PID: Remove ai-supervisor.pid
Phase14->>Pulse: AI session complete
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the issue of SIGTERM kills (rc=143) during the AI reasoning phase by increasing timeouts and implementing a protection mechanism for AI-spawned processes. The use of a temporary PID file (ai-supervisor.pid) that the orphan cleanup phase respects is a robust solution for handling concurrent supervisor pulses. I have identified a few style guide violations regarding local variable declarations and blanket error suppression with 2>/dev/null that should be addressed to maintain consistency and improve debuggability, aligning with repository-specific rules.
| if [[ "$sweep_basename" == "ai-supervisor" ]] && kill -0 "$sweep_pid" 2>/dev/null; then | ||
| protected_pids="${protected_pids} ${sweep_pid}" | ||
| local ai_sweep_descendants | ||
| ai_sweep_descendants=$(_list_descendants "$sweep_pid" 2>/dev/null || true) |
There was a problem hiding this comment.
Avoid blanket suppression of stderr with 2>/dev/null. If _list_descendants fails, the error message should be visible in the logs for diagnosis, especially since this logic is critical for protecting AI processes.
| ai_sweep_descendants=$(_list_descendants "$sweep_pid" 2>/dev/null || true) | |
| ai_sweep_descendants=$(_list_descendants "$sweep_pid" || true) |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid blanket suppression of errors with '2>/dev/null' to ensure that authentication, syntax, or system issues remain visible for debugging.
| fi | ||
| local sweep_task_status | ||
| sweep_task_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$(basename "$pid_file" .pid)")';" 2>/dev/null || echo "") | ||
| sweep_task_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$sweep_basename")';" 2>/dev/null || echo "") |
There was a problem hiding this comment.
Avoid blanket suppression of stderr with 2>/dev/null. Database errors should be visible in the logs to help diagnose issues with task status lookups during the sweep.
| sweep_task_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$sweep_basename")';" 2>/dev/null || echo "") | |
| sweep_task_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$sweep_basename")';" || echo "") |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid blanket suppression of errors with '2>/dev/null' to ensure that authentication, syntax, or system issues remain visible for debugging.
| # Write a PID file for the AI session so Phase 4e does not kill | ||
| # opencode/claude processes spawned during reasoning or action execution | ||
| # (t1301: concurrent pulses can trigger Phase 4e while AI is running). | ||
| local ai_pid_file="${SUPERVISOR_DIR}/pids/ai-supervisor.pid" |
There was a problem hiding this comment.
Style guide violation: declare and assign separately for local variables to ensure exit code safety.
| local ai_pid_file="${SUPERVISOR_DIR}/pids/ai-supervisor.pid" | |
| local ai_pid_file | |
| ai_pid_file="${SUPERVISOR_DIR}/pids/ai-supervisor.pid" |
References
- Use local var="$1" pattern in functions (declare and assign separately for exit code safety). (link)
- Separate 'local' declaration from assignment for local variables to ensure exit code safety and consistency.
| # opencode/claude processes spawned during reasoning or action execution | ||
| # (t1301: concurrent pulses can trigger Phase 4e while AI is running). | ||
| local ai_pid_file="${SUPERVISOR_DIR}/pids/ai-supervisor.pid" | ||
| echo "$$" >"$ai_pid_file" 2>/dev/null || true |
There was a problem hiding this comment.
Avoid blanket suppression of stderr with 2>/dev/null. This masks potential system errors (e.g., permission denied, directory missing) that are important for debugging, especially since this PID file is critical for the new protection logic.
| echo "$$" >"$ai_pid_file" 2>/dev/null || true | |
| echo "$$" >"$ai_pid_file" || true |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid blanket suppression of errors with '2>/dev/null' to ensure that authentication, syntax, or system issues remain visible for debugging.
| ai_result=$(run_ai_actions_pipeline "$ai_repo_path" "full" 2>>"$ai_log_file") || ai_rc=$? | ||
|
|
||
| # Remove AI session PID file now that the pipeline has completed | ||
| rm -f "$ai_pid_file" 2>/dev/null || true |
There was a problem hiding this comment.
Avoid blanket suppression of stderr with 2>/dev/null. rm -f is already silent if the file doesn't exist; suppressing all errors can mask permission issues or other filesystem problems.
| rm -f "$ai_pid_file" 2>/dev/null || true | |
| rm -f "$ai_pid_file" || true |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid blanket suppression of errors with '2>/dev/null' to ensure that authentication, syntax, or system issues remain visible for debugging.
e1a1bbd to
c8073e7
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Feb 22 02:18:51 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Root cause: concurrent supervisor pulses were breaking stale locks (600s timeout < actual pulse duration with Phase 14 AI reasoning) and running Phase 4e orphan cleanup which killed opencode/claude processes mid-reasoning. Three-part fix: 1. Increase PULSE_LOCK_TIMEOUT from 600s to 1800s (30 min) to prevent concurrent pulses when AI reasoning runs long (108KB+ context ~6+ min) 2. Write ai-supervisor.pid during Phase 14 AI pipeline execution so Phase 4e protects the pulse process and all its AI-spawned descendants from orphan cleanup by concurrent pulses 3. Increase SUPERVISOR_AI_TIMEOUT from 300s to 600s (10 min) via launchd plist env var to give AI CLI more time for large context prompts 4. Add explicit rc=143 log message in Phase 14 to aid future diagnosis Also resolves TODO.md merge conflict from concurrent supervisor edits.
c8073e7 to
488cb84
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Feb 22 02:37:09 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.agents/scripts/supervisor/pulse.sh (2)
2930-2946: Core protection logic is correct; descendant matching has a pre-existing newline-delimiter flaw now extended here.The
kill -0guard,continue-skip of the DB-status path, andsweep_basenamerefactor are all clean. One subtle issue worth fixing:_list_descendantsemits newline-separated PIDs, so when concatenated intoprotected_pids(space-delimited string) and checked withgrep -q " ${opid} ", grep matches line-by-line. Any PID that lands in the middle of a newline-embedded string (preceded by\nrather than a space) won't matchPIDand won't be skipped by the orphan check. Only the last descendant in the output (which gets the trailing space fromecho "...") reliably matches. The same flaw exists in the pre-existingsweep_descendantsblock at line 2950—this PR duplicates the pattern forai_sweep_descendants.🛡️ Proposed fix — normalize newlines to spaces before concatenation (apply to both occurrences)
- if [[ -n "$ai_sweep_descendants" ]]; then - protected_pids="${protected_pids} ${ai_sweep_descendants}" - fi + while IFS= read -r _desc_pid; do + [[ -n "$_desc_pid" ]] && protected_pids="${protected_pids} ${_desc_pid}" + done <<< "$ai_sweep_descendants"Apply the same fix to the pre-existing
sweep_descendantsblock (lines 2950–2953):- if [[ -n "$sweep_descendants" ]]; then - protected_pids="${protected_pids} ${sweep_descendants}" - fi + while IFS= read -r _desc_pid; do + [[ -n "$_desc_pid" ]] && protected_pids="${protected_pids} ${_desc_pid}" + done <<< "$sweep_descendants"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 2930 - 2946, The descendant PID lists from _list_descendants are newline-delimited and are being concatenated into the space-delimited protected_pids string, causing later checks that use grep -q " ${opid} " to miss PIDs; change the concatenation to normalize newlines to spaces (e.g. convert ai_sweep_descendants and sweep_descendants output from _list_descendants into a space-separated string before appending to protected_pids) so every PID is separated by spaces and the grep check will reliably match; update the blocks that set ai_sweep_descendants and sweep_descendants accordingly while keeping the rest of the logic (kill -0, continue, DB status) unchanged.
3674-3694: PID lifecycle is correctly bracketed around the pipeline; consider adding cleanup to the EXIT trap as defense-in-depth.The write-before/remove-after placement is correct, and
|| trueon both operations prevents write errors from aborting the pulse. If the pulse is killed (SIGTERM/SIGKILL) between lines 3678 and 3694, the file persists until the next successful Phase 14 run overwrites it. The stale file is self-healing (a dead$$causeskill -0to fail, so no protection triggers), but adding explicit cleanup to the EXIT trap at line 696 would be cleaner since the path is deterministic:♻️ Optional: harden the EXIT trap to clean up the PID file on unexpected pulse termination
- trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" EXIT INT TERM + trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' '${SUPERVISOR_DIR}/pids/ai-supervisor.pid' 2>/dev/null || true" EXIT INT TERM🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 3674 - 3694, Add deterministic cleanup for the AI session PID file to the existing EXIT trap: detect the same ai_pid_file name used when creating the PID (variable ai_pid_file / ai-supervisor.pid), and in the EXIT trap read the file and only remove it if its contents match the current shell PID ($$) to avoid deleting another run's PID; ensure removal uses rm -f and errors are ignored (|| true) so the trap never aborts..agents/scripts/supervisor/launchd.sh (1)
150-160: Optional: parameterise the hardcoded timeouts and move the comment before the assignment.Two minor housekeeping points:
1800and600are hardcoded literals. A future timeout change requires editing bothsupervisor-helper.sh(line 206) and this function. Consider accepting them as optional positional args (e.g.,$7/$8with defaults) so callers can override without touching plist generation logic.The explanatory comment block sits after the
env_dictclosing", making it visually disconnected from the values it documents. Moving it immediately before theenv_dict=assignment would read more naturally.♻️ Suggested refactor (comment placement + optional parameterisation)
_generate_supervisor_pulse_plist() { local script_path="$1" local interval_seconds="${2:-120}" local log_path="$3" local batch_arg="${4:-}" local env_path="${5:-/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin}" local gh_token="${6:-}" + # SUPERVISOR_PULSE_LOCK_TIMEOUT (default 1800s/30min): must exceed max pulse + # duration including Phase 14 AI reasoning to prevent concurrent pulses from + # breaking a live lock and running Phase 4e orphan cleanup (t1301). + # SUPERVISOR_AI_TIMEOUT (default 600s/10min): headroom for large-context AI prompts. + local pulse_lock_timeout="${7:-1800}" + local ai_timeout="${8:-600}" ... - # SUPERVISOR_PULSE_LOCK_TIMEOUT=1800s (30 min): prevents concurrent pulses when - # ... env_dict="<key>PATH</key> <string>${env_path}</string> <key>SUPERVISOR_PULSE_LOCK_TIMEOUT</key> - <string>1800</string> + <string>${pulse_lock_timeout}</string> <key>SUPERVISOR_AI_TIMEOUT</key> - <string>600</string>" + <string>${ai_timeout}</string>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/launchd.sh around lines 150 - 160, Parameterise the hardcoded timeouts and move the explanatory comment before the plist/environment block: update the code that builds env_dict (referencing env_dict) to accept optional positional args (e.g., $7 for SUPERVISOR_PULSE_LOCK_TIMEOUT and $8 for SUPERVISOR_AI_TIMEOUT) with defaults of 1800 and 600 respectively, replace the literal "1800" and "600" strings with those variables when emitting SUPERVISOR_PULSE_LOCK_TIMEOUT and SUPERVISOR_AI_TIMEOUT, and relocate the explanatory comment block so it appears immediately before the env_dict= assignment (and document the new parameters so callers can override timeouts without editing plist generation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor/launchd.sh:
- Around line 151-154: The cron-based installer isn’t being given
SUPERVISOR_AI_TIMEOUT, so update the cron launch code (_cmd_cron_linux) to
export SUPERVISOR_AI_TIMEOUT=600 (or read from an existing config/env) before
executing the supervisor command so ai-reason.sh’s timeout fallback (the code
referenced in ai-reason.sh around line 246) sees the new value; alternatively
expose SUPERVISOR_AI_TIMEOUT as a configurable environment variable that
_cmd_cron_linux will preserve/forward when writing the cron job so Linux cron
installs get the same 600s AI timeout as the macOS launchd plist.
---
Nitpick comments:
In @.agents/scripts/supervisor/launchd.sh:
- Around line 150-160: Parameterise the hardcoded timeouts and move the
explanatory comment before the plist/environment block: update the code that
builds env_dict (referencing env_dict) to accept optional positional args (e.g.,
$7 for SUPERVISOR_PULSE_LOCK_TIMEOUT and $8 for SUPERVISOR_AI_TIMEOUT) with
defaults of 1800 and 600 respectively, replace the literal "1800" and "600"
strings with those variables when emitting SUPERVISOR_PULSE_LOCK_TIMEOUT and
SUPERVISOR_AI_TIMEOUT, and relocate the explanatory comment block so it appears
immediately before the env_dict= assignment (and document the new parameters so
callers can override timeouts without editing plist generation).
In @.agents/scripts/supervisor/pulse.sh:
- Around line 2930-2946: The descendant PID lists from _list_descendants are
newline-delimited and are being concatenated into the space-delimited
protected_pids string, causing later checks that use grep -q " ${opid} " to miss
PIDs; change the concatenation to normalize newlines to spaces (e.g. convert
ai_sweep_descendants and sweep_descendants output from _list_descendants into a
space-separated string before appending to protected_pids) so every PID is
separated by spaces and the grep check will reliably match; update the blocks
that set ai_sweep_descendants and sweep_descendants accordingly while keeping
the rest of the logic (kill -0, continue, DB status) unchanged.
- Around line 3674-3694: Add deterministic cleanup for the AI session PID file
to the existing EXIT trap: detect the same ai_pid_file name used when creating
the PID (variable ai_pid_file / ai-supervisor.pid), and in the EXIT trap read
the file and only remove it if its contents match the current shell PID ($$) to
avoid deleting another run's PID; ensure removal uses rm -f and errors are
ignored (|| true) so the trap never aborts.
| <key>SUPERVISOR_PULSE_LOCK_TIMEOUT</key> | ||
| <string>1800</string> | ||
| <key>SUPERVISOR_AI_TIMEOUT</key> | ||
| <string>600</string>" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n 'SUPERVISOR_AI_TIMEOUT' --type shRepository: marcusquinn/aidevops
Length of output: 357
🏁 Script executed:
# Check cron.sh for any SUPERVISOR_AI_TIMEOUT handling or environment setup
fd -t f 'cron.sh' --type fRepository: marcusquinn/aidevops
Length of output: 98
🏁 Script executed:
# Check pulse.sh for SUPERVISOR_AI_TIMEOUT references
fd -t f 'pulse.sh' --type fRepository: marcusquinn/aidevops
Length of output: 137
🏁 Script executed:
# Check cron.sh to see if it sets up environment variables like SUPERVISOR_AI_TIMEOUT
cat -n .agents/scripts/supervisor/cron.sh | head -100Repository: marcusquinn/aidevops
Length of output: 3323
🏁 Script executed:
# Check pulse.sh to understand the entry point and environment setup
cat -n .agents/scripts/supervisor/pulse.sh | head -150Repository: marcusquinn/aidevops
Length of output: 5725
🏁 Script executed:
# Check ai-reason.sh around line 246 to see the timeout usage
sed -n '240,260p' .agents/scripts/supervisor/ai-reason.shRepository: marcusquinn/aidevops
Length of output: 722
🏁 Script executed:
# Check _cmd_cron_linux function to see if it handles SUPERVISOR_AI_TIMEOUT
sed -n '100,200p' .agents/scripts/supervisor/cron.shRepository: marcusquinn/aidevops
Length of output: 2706
🏁 Script executed:
# Check supervisor-helper.sh to see how it invokes pulse and whether it sets env vars
rg -n 'SUPERVISOR_AI_TIMEOUT|supervisor-pulse' .agents/scripts/supervisor/supervisor-helper.sh -A 3 -B 3Repository: marcusquinn/aidevops
Length of output: 151
🏁 Script executed:
# Find supervisor-helper.sh
fd -t f 'supervisor-helper.sh'Repository: marcusquinn/aidevops
Length of output: 100
🏁 Script executed:
# Check the Linux cron backend to understand how environment variables are passed
sed -n '150,200p' .agents/scripts/supervisor/cron.shRepository: marcusquinn/aidevops
Length of output: 1638
🏁 Script executed:
# Check supervisor-helper.sh to see how it invokes pulse and whether it sets SUPERVISOR_AI_TIMEOUT
cat -n .agents/scripts/supervisor-helper.sh | head -100Repository: marcusquinn/aidevops
Length of output: 7703
🏁 Script executed:
# Search for how pulse is called and whether SUPERVISOR_AI_TIMEOUT is set in supervisor-helper.sh
rg -n 'pulse|SUPERVISOR_AI_TIMEOUT' .agents/scripts/supervisor-helper.sh -A 2 -B 2Repository: marcusquinn/aidevops
Length of output: 9204
🏁 Script executed:
# Check if SUPERVISOR_AI_TIMEOUT is set anywhere else in the codebase (environment files, config, etc.)
rg -n 'SUPERVISOR_AI_TIMEOUT' --type sh --type bashRepository: marcusquinn/aidevops
Length of output: 92
🏁 Script executed:
# Confirm the exact lines in launchd.sh where SUPERVISOR_AI_TIMEOUT is injected
sed -n '150,165p' .agents/scripts/supervisor/launchd.shRepository: marcusquinn/aidevops
Length of output: 789
🏁 Script executed:
# Final verification: check what default SUPERVISOR_AI_TIMEOUT is in ai-reason.sh
sed -n '240,250p' .agents/scripts/supervisor/ai-reason.shRepository: marcusquinn/aidevops
Length of output: 354
Cron-based installs do not receive the new SUPERVISOR_AI_TIMEOUT=600 timeout.
The launchd plist injection sets SUPERVISOR_AI_TIMEOUT=600 for macOS, but the Linux cron backend (_cmd_cron_linux) passes only PATH as an environment variable. Linux cron users will fall back to the hardcoded default of 300 seconds in ai-reason.sh:246, retaining the old (shorter) AI timeout and remaining vulnerable to the same large-context timeout kills that this PR addresses on macOS. Either inject SUPERVISOR_AI_TIMEOUT=600 in the cron command or expose it as a configurable environment variable for cron installs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/launchd.sh around lines 151 - 154, The cron-based
installer isn’t being given SUPERVISOR_AI_TIMEOUT, so update the cron launch
code (_cmd_cron_linux) to export SUPERVISOR_AI_TIMEOUT=600 (or read from an
existing config/env) before executing the supervisor command so ai-reason.sh’s
timeout fallback (the code referenced in ai-reason.sh around line 246) sees the
new value; alternatively expose SUPERVISOR_AI_TIMEOUT as a configurable
environment variable that _cmd_cron_linux will preserve/forward when writing the
cron job so Linux cron installs get the same 600s AI timeout as the macOS
launchd plist.



Summary
opencode/claudeprocesses mid-reasoning, producing rc=143 (SIGTERM)PULSE_LOCK_TIMEOUT600s→1800s to prevent concurrent pulses, (2) writeai-supervisor.pidduring Phase 14 so Phase 4e protects AI-spawned processes, (3) increaseSUPERVISOR_AI_TIMEOUT300s→600s for large context promptsChanges
supervisor-helper.shPULSE_LOCK_TIMEOUT600→1800ssupervisor/launchd.shSUPERVISOR_PULSE_LOCK_TIMEOUT=1800andSUPERVISOR_AI_TIMEOUT=600in launchd plist envsupervisor/pulse.shai-supervisor.pidaround Phase 14 pipeline; protect it in Phase 4e orphan sweep; add rc=143 log branchsupervisor/utility.shTODO.mdRoot Cause Analysis
The rc=143 kills at 01:10 and 01:17 on 2026-02-22 were caused by:
PULSE_LOCK_TIMEOUT)opencode/claudeprocesses with PPID=1 (reparented after pulse A's subshell exited)The fix ensures: (a) the lock timeout exceeds the maximum pulse duration, and (b) even if a concurrent pulse breaks the lock, the
ai-supervisor.pidfile causes Phase 4e to protect the AI session's processes.Testing
ai-supervisor.pidprotection useskill -0liveness check, so stale PID files from crashed pulses don't cause false protectionCloses #2119
Summary by CodeRabbit
Release Notes