t1165.3: Remote container support — dispatch to containers on remote hosts via SSH/Tailscale#2109
t1165.3: Remote container support — dispatch to containers on remote hosts via SSH/Tailscale#2109marcusquinn merged 3 commits intomainfrom
Conversation
…tch (t1165.3) Implements remote container dispatch with: - Host configuration management (add/remove/list) - SSH and Tailscale transport support - Connectivity verification (Docker, OrbStack, AI CLI, SSH agent) - Credential forwarding (API keys, GH tokens, SSH agent) - Remote workspace creation and worker dispatch - Log collection (stream and batch) - Remote worker status monitoring - Cleanup of remote resources Chose SSH agent forwarding (-A) over explicit key copying for security. Chose JSON config file for hosts over SQLite to keep it simple and editable.
….sh (t1165.3) - Add dispatch_target column to tasks table (database migration + init schema) - Route tasks with dispatch_target to remote-dispatch-helper.sh in cmd_dispatch() - Add remote worker status checking in cmd_worker_status() - Handle remote PID format (remote:host:pid) in pulse Phase 1 - Auto-collect remote logs before evaluation when remote worker finishes - Chose dispatch_target column approach over separate remote_tasks table for simplicity
WalkthroughThis pull request introduces remote container dispatch capabilities for AI workers, enabling dispatching to containers on remote hosts via SSH or Tailscale. Changes include a new helper script for remote orchestration, database schema updates tracking dispatch targets, supervisor integration for routing tasks to remote hosts, and comprehensive documentation. Changes
Sequence DiagramsequenceDiagram
participant Local as Local Supervisor
participant DB as SQLite DB
participant Helper as Remote Dispatch Helper
participant SSH as SSH/Tailscale
participant Remote as Remote Host
participant Worker as Remote AI Worker
Local->>DB: Check task dispatch_target
DB-->>Local: dispatch_target = "host-name"
Local->>Helper: dispatch task_id "host-name"
Helper->>Helper: _resolve_host("host-name")
Helper->>SSH: Test SSH/Tailscale reachability
SSH-->>Helper: ✓ Connected
Helper->>Remote: Validate remote (docker, AI CLI, agent fwd)
Remote-->>Helper: ✓ Capabilities confirmed
Helper->>Helper: _build_credential_env() + SSH agent fwd
Helper->>Remote: Create workspace & upload dispatch/wrapper scripts
Remote-->>Helper: ✓ Scripts uploaded
Helper->>Remote: Launch remote worker (optional: in container)
Remote->>Worker: Start AI worker process
Worker-->>Remote: Worker running (PID: 12345)
Remote-->>Helper: remote:host-name:12345
Helper-->>Local: Return remote PID
Local->>DB: Store remote:host-name:12345 in pid_file
Local->>DB: Mark task RUNNING with remote session label
rect rgba(200, 150, 255, 0.5)
Note over Local,DB: Status Monitoring Phase
Local->>DB: Get PID for task
DB-->>Local: remote:host-name:12345
Local->>Helper: cmd_status task_id remote:host-name:12345
Helper->>SSH: Query remote process & logs
SSH-->>Helper: Worker status
Helper-->>Local: alive/dead + log data
end
rect rgba(150, 200, 255, 0.5)
Note over Local,DB: Log Collection & Cleanup
Local->>Helper: cmd_logs task_id (collect final logs)
Helper->>Remote: Fetch logs from workspace
Remote-->>Helper: Log content
Helper-->>Local: Logs saved locally
Local->>Helper: cmd_cleanup task_id (remove remote workspace)
Helper->>Remote: Kill worker process & remove workspace
Remote-->>Helper: ✓ Cleaned
Helper-->>Local: ✓ Cleanup done
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 15:45:15 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
…t1165.3) - Add tools/containers/remote-dispatch.md with full usage guide - Update subagent-index.toon to include remote-dispatch - Update AGENTS.md domain index with remote-dispatch reference - Add remote dispatch cross-reference in headless-dispatch.md
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 15:48:15 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces robust remote container support, enabling AI workers to be dispatched to and managed on remote hosts via SSH or Tailscale. This significantly enhances the system's scalability and resource utilization by allowing tasks to leverage specialized hardware or distributed environments. The changes include a new helper script for managing remote hosts, secure credential forwarding, automated log collection, and seamless integration with the existing supervisor dispatch and pulse mechanisms. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
There was a problem hiding this comment.
Code Review
The pull request introduces a robust remote container dispatch system, allowing AI workers to run on remote hosts via SSH or Tailscale. The implementation includes host management, credential forwarding, and automated log collection integrated into the supervisor's pulse loop. The code generally follows the project's shell scripting standards, though there are some opportunities to improve adherence to the style guide regarding temporary file handling and explicit return statements. Security-wise, ensuring restricted permissions on the remote workspace is recommended to protect forwarded API keys.
| local remote_log_file="${remote_work_dir}/worker.log" | ||
|
|
||
| _log_info "Creating remote workspace: $remote_work_dir" | ||
| if ! "${ssh_cmd[@]}" "mkdir -p '${remote_work_dir}'" 2>/dev/null; then |
There was a problem hiding this comment.
The remote workspace is created in /tmp and will contain dispatch.sh, which includes sensitive API keys. To prevent exposure to other users on the remote host, the directory should be created with restricted permissions (e.g., 700).
| if ! "${ssh_cmd[@]}" "mkdir -p '${remote_work_dir}'" 2>/dev/null; then | |
| if ! "${ssh_cmd[@]}" "mkdir -m 700 -p '${remote_work_dir}'" 2>/dev/null; then |
|
|
||
| # Add host to config | ||
| local tmp_file | ||
| tmp_file=$(mktemp) |
There was a problem hiding this comment.
Temporary files must be cleaned up using the established project pattern: _save_cleanup_scope, trap '_run_cleanups' RETURN, and push_cleanup for robust cleanup on any exit path. The current direct trap should be replaced with an integration into this project-specific cleanup mechanism.
References
- Temp files must have trap cleanup (RETURN or EXIT) (link)
- For resource cleanup in shell scripts, use the established project pattern: use
_save_cleanup_scope,trap '_run_cleanups' RETURN, andpush_cleanupfor robust cleanup on any exit path, and also include explicit manual cleanup at the end of the normal execution path as a 'fast-path'.
| _build_remote_command_with_creds() { | ||
| local remote_command="$1" | ||
| shift | ||
|
|
||
| # Build env export prefix for the remote command | ||
| local env_prefix="" | ||
| local -a cred_env=() | ||
| while IFS= read -r line; do | ||
| [[ -z "$line" ]] && continue | ||
| cred_env+=("$line") | ||
| done < <(_build_credential_env) | ||
|
|
||
| if [[ ${#cred_env[@]} -gt 0 ]]; then | ||
| # Use env command on remote to set variables (avoids shell escaping issues) | ||
| env_prefix="env" | ||
| for var in "${cred_env[@]}"; do | ||
| env_prefix+=" $(printf '%q' "$var")" | ||
| done | ||
| env_prefix+=" " | ||
| fi | ||
|
|
||
| echo "${env_prefix}${remote_command}" | ||
| return 0 | ||
| } | ||
|
|
| # Clone or update repo | ||
| if [[ -d repo/.git ]]; then | ||
| cd repo | ||
| git fetch origin |
There was a problem hiding this comment.
When using git commands in shell scripts, prefer using the -q flag to suppress standard output instead of redirecting to /dev/null. This ensures that actual error messages on stderr remain visible for debugging.
| git fetch origin | |
| git fetch -q origin |
References
- When using git commands (like 'init', 'remote') in shell scripts, use the '-q' flag to suppress standard output instead of '2>/dev/null', ensuring that error messages on stderr remain visible for debugging.
| local address="" transport="" user="" | ||
|
|
||
| if [[ -f "$meta_file" ]]; then | ||
| remote_log_file=$(jq -r '.remote_log_file' "$meta_file" 2>/dev/null || echo "$remote_log_file") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null is redundant here because the existence of $meta_file has already been verified on the preceding line. Avoiding unnecessary suppression helps in identifying other potential issues like permission errors.
| remote_log_file=$(jq -r '.remote_log_file' "$meta_file" 2>/dev/null || echo "$remote_log_file") | |
| remote_log_file=$(jq -r '.remote_log_file' "$meta_file" || echo "$remote_log_file") |
References
- Avoid using 2>/dev/null to suppress errors on file operations if the file's existence has already been verified by a preceding check. This practice is redundant for 'file not found' errors and can mask other important issues like permissions problems.
| return 1 | ||
| ;; | ||
| esac | ||
| } |
There was a problem hiding this comment.
All functions must have explicit return statements according to the Repository Style Guide (line 12).
| } | |
| return 0 | |
| } |
References
- All functions must have explicit return statements (link)
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (4)
.agents/scripts/remote-dispatch-helper.sh (1)
412-435:_build_remote_command_with_credsis defined but never called — dead code.Credential forwarding in
cmd_dispatchis done directly viacred_env_strembedded in the dispatch heredoc. This helper is entirely unused.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/remote-dispatch-helper.sh around lines 412 - 435, The helper function _build_remote_command_with_creds is dead code; either remove it or wire it into cmd_dispatch to centralize credential-forwarding logic: update cmd_dispatch to stop embedding cred_env_str directly in the heredoc and instead call _build_remote_command_with_creds(remote_command) to produce the final command (or delete _build_remote_command_with_creds if you prefer removal), and ensure any remaining uses of cred_env_str are removed so there’s a single source of truth for credential env construction..agents/scripts/supervisor/dispatch.sh (2)
3264-3270: Connectivity check before remote dispatch is a good defensive pattern, but consider logging the check output on failure.The check output is discarded (
>/dev/null 2>&1). When the remote host is unreachable, operators need to know why (DNS resolution, timeout, SSH key, Tailscale auth). Consider capturing stderr for the log.♻️ Log connectivity check failure details
local remote_check_rc=0 - "$remote_helper" check "$dispatch_target" >/dev/null 2>&1 || remote_check_rc=$? + local remote_check_output="" + remote_check_output=$("$remote_helper" check "$dispatch_target" 2>&1) || remote_check_rc=$? if [[ "$remote_check_rc" -ne 0 ]]; then log_error "Remote host $dispatch_target is unreachable — deferring dispatch (t1165.3)" + log_error "Check output: ${remote_check_output:0:200}" return 3 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/dispatch.sh around lines 3264 - 3270, The connectivity check currently discards output; change the call to capture the command's stderr/stdout (e.g., into a variable like check_output) when invoking "$remote_helper" check "$dispatch_target", preserve remote_check_rc on failure, and include the captured output in the log_error message so operators see the actual failure reason; update references around remote_helper, dispatch_target, remote_check_rc, and log_error to use the new check_output variable when logging.
3249-3294: Remote dispatch block is placed after local wrapper/dispatch script generation — unnecessary work for remote targets.Lines 3159–3247 build the local
dispatch_scriptandwrapper_script(including the worktree creation at line 2942), but these are never used whendispatch_targetis set. Moving the remote-dispatch check earlier (e.g., right after the model resolution block) would skip worktree creation, MCP config generation, and local script generation for remote targets — saving I/O and reducing code path complexity.This isn't a correctness issue, but as the remote dispatch path grows, the wasted setup becomes more significant (worktree creation involves
git worktree add).♻️ Suggested restructure: move remote check earlier
Move the remote dispatch target check (lines 3249–3294) to just after the model resolution and recording block (~line 3104), before the local dispatch script generation starts. You'd need to keep the model resolution, health checks, and the
resolved_modelassignment, but skip worktree creation and local script generation entirely for remote targets.+ # t1165.3: Remote dispatch — check before local worktree/script setup + local dispatch_target="" + dispatch_target=$(db "$SUPERVISOR_DB" "SELECT COALESCE(dispatch_target, '') FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null) || dispatch_target="" + + if [[ -n "$dispatch_target" ]]; then + # ... remote dispatch logic (lines 3256-3292) ... + fi + + # Local dispatch continues below — create worktree, scripts, etc.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/dispatch.sh around lines 3249 - 3294, The remote-dispatch detection and handling (the block using dispatch_target, remote-dispatch-helper.sh, remote_check_rc, remote_pid, cmd_transition and add_model_label) should be moved earlier — directly after model resolution and recording (i.e., after resolved_model is set) and before any local worktree/mcp/wrapper/dispatch_script generation; update the script so that once dispatch_target is non-empty you perform the remote connectivity check and remote dispatch flow there and return early, thereby skipping git worktree creation and local script generation for remote targets..agents/tools/containers/remote-dispatch.md (1)
107-132: Supervisor integration section shows rawsqlite3for settingdispatch_target— note the lack of CLI wrapper.Line 123–126 instructs users to run raw
sqlite3commands against the supervisor DB. This is functional but fragile (no validation, easy to mistype). The "Future" note on line 132 about TODO.md integration is good. Consider also noting that asupervisor-helper.sh set-target <task_id> <host>CLI command would be a natural next step, so users know this is intentional and not an oversight.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/tools/containers/remote-dispatch.md around lines 107 - 132, The docs show a raw sqlite3 command to set dispatch_target directly in the supervisor DB which is fragile; update the .agents/tools/containers/remote-dispatch.md supervisor integration section to replace or augment the raw sqlite3 example with a note recommending a CLI wrapper (e.g., introduce a suggested supervisor-helper.sh set-target <task_id> <host> command) and mention that cmd_dispatch() will read dispatch_target and that the recommended CLI will validate input, sanitize task IDs, and error on missing tasks to avoid direct DB edits; keep the raw sqlite3 example as an advanced fallback but mark it as "advanced/unsafe" and point readers to remote-dispatch-helper.sh and cmd_dispatch() for normal usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/remote-dispatch-helper.sh:
- Around line 695-714: Add the SSH user to dispatched metadata and read it back
in status/cleanup: in cmd_dispatch include "user": "${user:-}" (or the variable
used to hold the target SSH user) when writing $local_meta_file so the metadata
contains the intended SSH user; then in cmd_status and cmd_cleanup load/parse
the metadata JSON and set the local user variable from that file (instead of
hardcoding local user="") so _build_ssh_cmd uses the saved user for remote
connections; make sure to reference $local_meta_file when reading and fall back
to the current OS user if the field is absent.
- Around line 371-402: The _build_credential_env function currently assembles
plaintext secrets into dispatch_content which is uploaded as
/tmp/aidevops-worker/${task_id}/dispatch.sh and made executable with chmod +x,
leaving secrets world-readable; update the upload flow to create the remote file
with restrictive permissions (e.g., create or move the file with mode 600 or set
umask so the file is written with owner-only read/write) and immediately run
chmod 600 after upload before any other operations (and avoid using a globally
readable mode when making it executable), ensure cmd_cleanup still removes the
file, and update the misleading comment ("Does NOT forward actual secret values
over SSH") to accurately reflect that secrets are forwarded in the current
implementation; reference functions/variables: _build_credential_env,
dispatch_content, dispatch.sh, /tmp/aidevops-worker/${task_id}, and cmd_cleanup.
- Around line 797-799: The tail_lines value is interpolated into remote_cmd
without validation, allowing shell injection; update the block that sets
remote_cmd (using variables tail_lines and remote_log_file) to first validate
tail_lines is a non-negative integer (e.g. with a regex test like ^[0-9]+$ or by
converting and checking numeric) and only build remote_cmd when the check
passes; if validation fails, either unset tail_lines or error out early so no
untrusted content is inserted into the SSH-executed command.
- Around line 671-685: The container dispatch currently uses docker exec to run
the remote_wrapper path from the host filesystem and captures the wrong PID;
update the dispatch logic in the block that checks container_name so that you
(1) copy the host-side wrapper into the container (use docker cp via the ssh_cmd
session or copy into a container-accessible path) or alternatively run the
wrapper on the remote host but update the wrapper to invoke docker exec
internally; and (2) obtain the real in-container worker PID instead of the
short-lived helper PID by starting the worker inside the container and echoing
that worker PID back (e.g., after docker exec runs the wrapper that
writes/prints its own PID), then set remote_pid from that real PID so
cmd_status's kill -0 checks work; touch the references remote_wrapper,
remote_pid, ssh_cmd, docker exec and cmd_status when making these changes.
- Line 33: DEFAULT_SSH_OPTS contains OpenSSH -o flags but _build_ssh_cmd()
currently passes them into `tailscale ssh` which ignores them; update
_build_ssh_cmd() to detect when using Tailscale transport (or when target host
matches Tailscale patterns) and either (A) invoke the standard ssh client with
DEFAULT_SSH_OPTS for those targets, or (B) omit/replace DEFAULT_SSH_OPTS when
invoking `tailscale ssh` and introduce a separate TAILSCALE_SSH_OPTS or explicit
behavior for timeouts/host-checking; change references to DEFAULT_SSH_OPTS and
the call site in _build_ssh_cmd() to conditionally choose the command and
options accordingly so ConnectTimeout/StrictHostKeyChecking/ServerAliveInterval
are applied only to real ssh invocations.
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 3274-3280: The current remote dispatch failure block always calls
cmd_transition "$task_id" "failed" and returns 1; change it to inspect the
helper's exit code (the value of $? inside the failure branch) and treat
transient/availability codes (e.g., 3 as used elsewhere for "defer") as a defer:
call cmd_transition "$task_id" "deferred" or use the same --error payload but
then return 3; for other non-transient/permanent exit codes keep the existing
failed transition and return 1. Locate the failure handler around the
remote_helper dispatch invocation (remote_pid=$("$remote_helper" dispatch ... )
|| { ... }) and implement the conditional branching on the helper exit code,
preserving logging to SUPERVISOR_LOG and including the dispatch_target and
task_id in messages.
- Around line 3282-3284: The PID file and session string are fragile because
they join remote:${dispatch_target}:${remote_pid} with colon-delimited parsing
used later by cmd_worker_status; update the write and parsing so the PID is
unambiguous — either switch the delimiter to a safe char (e.g., use '|' when
writing echo "remote|${dispatch_target}|${remote_pid}" to
SUPERVISOR_DIR/pids/${task_id}.pid and to the session string passed to
cmd_transition), or keep the colon but change consumers to extract the PID and
host by parsing the last colon-separated field for PID and everything between
"remote:" and the last ":" for host (update where cmd_worker_status reads
fields). Ensure cmd_transition call and PID file use the same 3- or 4-field
format consistently (reference variables dispatch_target, remote_pid,
cmd_transition, and cmd_worker_status).
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1210-1235: The handling of "remote:host:pid" is only implemented
in one block; replicate that guard in every place that calls kill -0 or treats
PID files as numeric — specifically update the logic around
stale_has_live_worker (Phase 0.7), stuck_alive (Phase 1c), the Phase 4 health
check that runs rm -f "$pid_file" and then
cmd_evaluate/cmd_transition("failed"), and the live_pid_count computation —
detect if pid starts with "remote:", parse host and remote pid like the existing
block (use SCRIPT_DIR/../remote-dispatch-helper.sh), call the helper with status
to decide liveness (and collect logs via logs when transitioning), set the same
is_alive/stale_has_live_worker/stuck_alive/live_pid_count behavior as local
PIDs, and skip rm -f and local kill -0 for remote entries so remote workers are
not misidentified as dead.
- Around line 1230-1232: The current logic leaves is_alive=false when
remote-dispatch-helper.sh is not executable, causing remote tasks to be treated
as dead; update the remote task branch so that when the helper cannot be invoked
(the branch that currently calls log_warn " $tid: remote-dispatch-helper.sh not
found, cannot check remote worker") you set is_alive=true (conservative default)
and skip transitioning the task to retry/failed; only run the existing remote
health-check flow when the helper executes successfully, and keep the existing
local PID check (the elif kill -0 "$pid" branch) unchanged for non-remote PIDs.
- Around line 1213-1214: Remove the dead _remote_pid extraction: change the
local declaration that extracts remote:host:pid to only assign the host (use
local _remote_host) and delete the line creating _remote_pid; update any nearby
references to only use _remote_host when calling the helper for the status/logs
subcommands (these accept <task_id> and <host> only and obtain the PID from
metadata). After editing, run ShellCheck (-u) on
.agents/scripts/supervisor/pulse.sh (and ensure pre-commit will run ShellCheck
for .agents/scripts/**/*.sh) to satisfy lint rules.
In @.agents/tools/containers/remote-dispatch.md:
- Around line 146-151: Update the "Security notes" section to explicitly mention
that environment-variable-based credential forwarding exposes values to
/proc/<pid>/environ (readable by the same user and root on Linux) so keys aren't
written to disk but can still be read by processes with appropriate privileges,
and add a brief caveat for security-conscious deployments; also clarify the
exact transport mechanism used by the helper (whether it relies on SSH
AcceptEnv/SendEnv or uses inline exported variables like VAR=val command) so
readers know if SSH server/client env propagation can silently fail.
- Around line 262-269: Update the npm install line to use the current package
name: replace the existing `npm install -g `@anthropic-ai/claude-code`` invocation
with `npm install -g opencode-ai` so the remote host installs the correct
opencode CLI; leave the curl-based installer (`curl -fsSL
https://opencode.ai/install | bash`) unchanged.
---
Nitpick comments:
In @.agents/scripts/remote-dispatch-helper.sh:
- Around line 412-435: The helper function _build_remote_command_with_creds is
dead code; either remove it or wire it into cmd_dispatch to centralize
credential-forwarding logic: update cmd_dispatch to stop embedding cred_env_str
directly in the heredoc and instead call
_build_remote_command_with_creds(remote_command) to produce the final command
(or delete _build_remote_command_with_creds if you prefer removal), and ensure
any remaining uses of cred_env_str are removed so there’s a single source of
truth for credential env construction.
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 3264-3270: The connectivity check currently discards output;
change the call to capture the command's stderr/stdout (e.g., into a variable
like check_output) when invoking "$remote_helper" check "$dispatch_target",
preserve remote_check_rc on failure, and include the captured output in the
log_error message so operators see the actual failure reason; update references
around remote_helper, dispatch_target, remote_check_rc, and log_error to use the
new check_output variable when logging.
- Around line 3249-3294: The remote-dispatch detection and handling (the block
using dispatch_target, remote-dispatch-helper.sh, remote_check_rc, remote_pid,
cmd_transition and add_model_label) should be moved earlier — directly after
model resolution and recording (i.e., after resolved_model is set) and before
any local worktree/mcp/wrapper/dispatch_script generation; update the script so
that once dispatch_target is non-empty you perform the remote connectivity check
and remote dispatch flow there and return early, thereby skipping git worktree
creation and local script generation for remote targets.
In @.agents/tools/containers/remote-dispatch.md:
- Around line 107-132: The docs show a raw sqlite3 command to set
dispatch_target directly in the supervisor DB which is fragile; update the
.agents/tools/containers/remote-dispatch.md supervisor integration section to
replace or augment the raw sqlite3 example with a note recommending a CLI
wrapper (e.g., introduce a suggested supervisor-helper.sh set-target <task_id>
<host> command) and mention that cmd_dispatch() will read dispatch_target and
that the recommended CLI will validate input, sanitize task IDs, and error on
missing tasks to avoid direct DB edits; keep the raw sqlite3 example as an
advanced fallback but mark it as "advanced/unsafe" and point readers to
remote-dispatch-helper.sh and cmd_dispatch() for normal usage.
| # --- Constants --- | ||
| readonly CONFIG_DIR="${HOME}/.config/aidevops" | ||
| readonly REMOTE_HOSTS_FILE="${REMOTE_DISPATCH_HOSTS_FILE:-${CONFIG_DIR}/remote-hosts.json}" | ||
| readonly DEFAULT_SSH_OPTS="${REMOTE_DISPATCH_SSH_OPTS:--o ConnectTimeout=10 -o StrictHostKeyChecking=accept-new -o ServerAliveInterval=30}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n ".agents/scripts/remote-dispatch-helper.sh" | head -60Repository: marcusquinn/aidevops
Length of output: 3240
🏁 Script executed:
cat -n ".agents/scripts/remote-dispatch-helper.sh" | sed -n '230,260p'Repository: marcusquinn/aidevops
Length of output: 1054
🌐 Web query:
tailscale ssh support standard OpenSSH client options -o ConnectTimeout StrictHostKeyChecking
💡 Result:
-
If you want standard OpenSSH client options (
-o ConnectTimeout=…,-o StrictHostKeyChecking=…, etc.): use your normalsshclient to connect to the Tailscale hostname/IP (for examplessh user@deviceorssh user@100.x.y.z). Tailscale SSH is designed so you can “SSH as normal” with your existing SSH client, and it manages distribution/handling of SSH host keys (including a Tailscale-managedknown_hosts) to make this work. (tailscale.com)Example:
ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new user@device
-
If you use the
tailscale ssh …subcommand: it is not just a thin wrapper around OpenSSH, and people commonly observe that OpenSSH flags like-o StrictHostKeyChecking=noare ignored when passed totailscale ssh. (github.com)
So: use ssh … -o … (regular OpenSSH) when you need OpenSSH options; reserve tailscale ssh for cases where you specifically need the Tailscale-built SSH client behavior. (tailscale.com)
Citations:
- 1: https://tailscale.com/kb/1193/tailscale-ssh/?utm_source=openai
- 2: SSH connection issue tailscale/tailscale#16092
- 3: https://tailscale.com/kb/1193/tailscale-ssh/?utm_source=openai
SSH options silently ignored by tailscale ssh — refactor to use standard SSH client or conditionally set options.
Line 33 defines DEFAULT_SSH_OPTS with -o ConnectTimeout=10 -o StrictHostKeyChecking=accept-new -o ServerAliveInterval=30, which is appropriate for standard SSH. However, when _build_ssh_cmd() dispatches via Tailscale (lines 240–253), these options are passed directly to the tailscale ssh subcommand, which does not support standard OpenSSH -o options and silently ignores them.
This means:
ConnectTimeout=10does not apply (no timeout enforcement)StrictHostKeyChecking=accept-newis not enforced (host key verification disabled)ServerAliveInterval=30does not apply (no keep-alive)
Fix: Either use regular ssh client with -o options when connecting to Tailscale addresses, or conditionally apply DEFAULT_SSH_OPTS only for direct SSH transport and configure Tailscale SSH behavior separately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/remote-dispatch-helper.sh at line 33, DEFAULT_SSH_OPTS
contains OpenSSH -o flags but _build_ssh_cmd() currently passes them into
`tailscale ssh` which ignores them; update _build_ssh_cmd() to detect when using
Tailscale transport (or when target host matches Tailscale patterns) and either
(A) invoke the standard ssh client with DEFAULT_SSH_OPTS for those targets, or
(B) omit/replace DEFAULT_SSH_OPTS when invoking `tailscale ssh` and introduce a
separate TAILSCALE_SSH_OPTS or explicit behavior for timeouts/host-checking;
change references to DEFAULT_SSH_OPTS and the call site in _build_ssh_cmd() to
conditionally choose the command and options accordingly so
ConnectTimeout/StrictHostKeyChecking/ServerAliveInterval are applied only to
real ssh invocations.
| _build_credential_env() { | ||
| local -a env_vars=() | ||
|
|
||
| # GitHub token (for gh CLI on remote) | ||
| if [[ -n "${GH_TOKEN:-}" ]]; then | ||
| env_vars+=("GH_TOKEN=${GH_TOKEN}") | ||
| elif [[ -n "${GITHUB_TOKEN:-}" ]]; then | ||
| env_vars+=("GH_TOKEN=${GITHUB_TOKEN}") | ||
| fi | ||
|
|
||
| # Anthropic API key (for AI CLI) | ||
| if [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then | ||
| env_vars+=("ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}") | ||
| fi | ||
|
|
||
| # OpenRouter API key (for model routing) | ||
| if [[ -n "${OPENROUTER_API_KEY:-}" ]]; then | ||
| env_vars+=("OPENROUTER_API_KEY=${OPENROUTER_API_KEY}") | ||
| fi | ||
|
|
||
| # Google AI key | ||
| if [[ -n "${GOOGLE_API_KEY:-}" ]]; then | ||
| env_vars+=("GOOGLE_API_KEY=${GOOGLE_API_KEY}") | ||
| fi | ||
|
|
||
| # Worker identification | ||
| env_vars+=("FULL_LOOP_HEADLESS=true") | ||
| env_vars+=("AIDEVOPS_REMOTE_DISPATCH=true") | ||
|
|
||
| printf '%s\n' "${env_vars[@]}" | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
API keys are written to a world-readable file in /tmp on the remote host.
_build_credential_env assembles ANTHROPIC_API_KEY, GH_TOKEN, OPENROUTER_API_KEY, and GOOGLE_API_KEY as plain export KEY=VALUE shell statements. These are embedded in dispatch_content and uploaded to /tmp/aidevops-worker/${task_id}/dispatch.sh via chmod +x (not chmod 600), making the file readable by all local users on the remote. The keys persist there until cmd_cleanup is called.
The comment at line 405 ("Does NOT forward actual secret values over SSH") is also incorrect and misleading — the implementation does exactly that.
🔒 Proposed fix — restrict permissions immediately after upload
echo "$dispatch_content" | "${ssh_cmd[@]}" "cat > '${remote_script}' && chmod +x '${remote_script}'" 2>/dev/null || {
_log_error "Failed to upload dispatch script"
return 1
}
+# Restrict to owner-only read/execute so API keys in the script aren't world-readable
+"${ssh_cmd[@]}" "chmod 600 '${remote_script}'; chmod u+x '${remote_script}'" 2>/dev/null || true
echo "$wrapper_content" | "${ssh_cmd[@]}" "cat > '${remote_wrapper}' && chmod +x '${remote_wrapper}'" 2>/dev/null || {
_log_error "Failed to upload wrapper script"
return 1
}
+"${ssh_cmd[@]}" "chmod 700 '${remote_wrapper}'" 2>/dev/null || trueLonger-term, prefer passing API keys via SSH SendEnv/AcceptEnv or a dedicated secrets manager on the remote, keeping them out of the filesystem entirely.
Also applies to: 550-568
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/remote-dispatch-helper.sh around lines 371 - 402, The
_build_credential_env function currently assembles plaintext secrets into
dispatch_content which is uploaded as
/tmp/aidevops-worker/${task_id}/dispatch.sh and made executable with chmod +x,
leaving secrets world-readable; update the upload flow to create the remote file
with restrictive permissions (e.g., create or move the file with mode 600 or set
umask so the file is written with owner-only read/write) and immediately run
chmod 600 after upload before any other operations (and avoid using a globally
readable mode when making it executable), ensure cmd_cleanup still removes the
file, and update the misleading comment ("Does NOT forward actual secret values
over SSH") to accurately reflect that secrets are forwarded in the current
implementation; reference functions/variables: _build_credential_env,
dispatch_content, dispatch.sh, /tmp/aidevops-worker/${task_id}, and cmd_cleanup.
| if [[ "$container_name" != "auto" && "$container_name" != "none" && -n "$container_name" ]]; then | ||
| # Dispatch inside a container on the remote host | ||
| _log_info "Dispatching inside container: $container_name" | ||
| remote_pid=$("${ssh_cmd[@]}" " | ||
| nohup docker exec -d '${container_name}' bash '${remote_wrapper}' & | ||
| echo \$! | ||
| " 2>/dev/null) | ||
| else | ||
| # Dispatch directly on the remote host | ||
| _log_info "Dispatching directly on remote host" | ||
| remote_pid=$("${ssh_cmd[@]}" " | ||
| nohup setsid bash '${remote_wrapper}' >> '${remote_log_file}' 2>&1 & | ||
| echo \$! | ||
| " 2>/dev/null) | ||
| fi |
There was a problem hiding this comment.
Container dispatch is broken in two independent ways.
-
Wrong filesystem namespace:
remote_wrapperis written to the remote HOST at/tmp/aidevops-worker/${task_id}/wrapper.sh.docker exec -d 'container' bash '${remote_wrapper}'evaluates that path inside the container's filesystem, where it doesn't exist unless a/tmpvolume is explicitly mounted. This silently fails (all errors suppressed with2>/dev/null), socmd_dispatchreturns successfully while no actual work was started. -
Wrong PID tracking:
docker exec -ddetaches immediately and exits; the outernohup ... &process therefore also exits within milliseconds.echo \$!captures the PID of this dead helper, not the in-container worker. All subsequentkill -0checks incmd_statuswill always report the process as dead.
🐛 Recommended approach for container dispatch
Upload the wrapper into the container via docker cp, then exec it:
-remote_pid=$("${ssh_cmd[@]}" "
- nohup docker exec -d '${container_name}' bash '${remote_wrapper}' &
- echo \$!
-" 2>/dev/null)
+remote_pid=$("${ssh_cmd[@]}" "
+ docker cp '${remote_wrapper}' '${container_name}:${remote_work_dir}/wrapper.sh' 2>/dev/null &&
+ docker exec -d '${container_name}' bash '${remote_work_dir}/wrapper.sh' &&
+ docker exec '${container_name}' pgrep -f 'wrapper.sh' | tail -1
+" 2>/dev/null)Alternatively, dispatch directly on the host and let the worker target the container via its own docker exec calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/remote-dispatch-helper.sh around lines 671 - 685, The
container dispatch currently uses docker exec to run the remote_wrapper path
from the host filesystem and captures the wrong PID; update the dispatch logic
in the block that checks container_name so that you (1) copy the host-side
wrapper into the container (use docker cp via the ssh_cmd session or copy into a
container-accessible path) or alternatively run the wrapper on the remote host
but update the wrapper to invoke docker exec internally; and (2) obtain the real
in-container worker PID instead of the short-lived helper PID by starting the
worker inside the container and echoing that worker PID back (e.g., after docker
exec runs the wrapper that writes/prints its own PID), then set remote_pid from
that real PID so cmd_status's kill -0 checks work; touch the references
remote_wrapper, remote_pid, ssh_cmd, docker exec and cmd_status when making
these changes.
| cat >"$local_meta_file" <<META_EOF | ||
| { | ||
| "task_id": "${task_id}", | ||
| "host": "${host}", | ||
| "address": "${address}", | ||
| "transport": "${transport}", | ||
| "container": "${container_name}", | ||
| "remote_pid": "${remote_pid}", | ||
| "remote_work_dir": "${remote_work_dir}", | ||
| "remote_log_file": "${remote_log_file}", | ||
| "dispatched_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", | ||
| "ai_cli": "${remote_ai_cli}", | ||
| "model": "${model:-default}" | ||
| } | ||
| META_EOF | ||
|
|
||
| _log_success "Dispatched $task_id to $host (remote PID: $remote_pid)" | ||
| _log_info "Remote workspace: $remote_work_dir" | ||
| _log_info "Remote log: $remote_log_file" | ||
| _log_info "Local metadata: $local_meta_file" |
There was a problem hiding this comment.
user is never saved to dispatch metadata, breaking cmd_status and cmd_cleanup for hosts with a custom SSH user.
cmd_dispatch writes the metadata JSON (lines 695–709) without a user field. Both cmd_status (line 851) and cmd_cleanup (line 958) hardcode local user="" rather than reading from metadata, so _build_ssh_cmd always uses the current OS user. For any host registered with --user pi, --user ubuntu, etc., status and cleanup calls will either connect as the wrong user or fail to connect entirely.
🐛 Proposed fix
In cmd_dispatch, add user to the metadata:
cat >"$local_meta_file" <<META_EOF
{
"task_id": "${task_id}",
"host": "${host}",
"address": "${address}",
"transport": "${transport}",
"container": "${container_name}",
+ "user": "${user}",
"remote_pid": "${remote_pid}",
...
}
META_EOFIn cmd_status and cmd_cleanup, read the field back:
-local user=""
+local user
+user=$(jq -r '.user // ""' "$meta_file" 2>/dev/null || echo "")
local -a ssh_cmd=()
while IFS= read -r line; do
ssh_cmd+=("$line")
done < <(_build_ssh_cmd "$address" "$transport" "$user")Also applies to: 842-856
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/remote-dispatch-helper.sh around lines 695 - 714, Add the
SSH user to dispatched metadata and read it back in status/cleanup: in
cmd_dispatch include "user": "${user:-}" (or the variable used to hold the
target SSH user) when writing $local_meta_file so the metadata contains the
intended SSH user; then in cmd_status and cmd_cleanup load/parse the metadata
JSON and set the local user variable from that file (instead of hardcoding local
user="") so _build_ssh_cmd uses the saved user for remote connections; make sure
to reference $local_meta_file when reading and fall back to the current OS user
if the field is absent.
| if [[ -n "$tail_lines" ]]; then | ||
| remote_cmd="tail -n ${tail_lines} '${remote_log_file}'" | ||
| fi |
There was a problem hiding this comment.
--tail N is not validated as a number before being interpolated into the remote command.
An untrusted or mistyped value (e.g., 1; kill -9 -1) would be injected into the SSH-executed shell command.
🛡️ Proposed fix
if [[ -n "$tail_lines" ]]; then
+ if ! [[ "$tail_lines" =~ ^[0-9]+$ ]]; then
+ _log_error "--tail requires a positive integer, got: '$tail_lines'"
+ return 1
+ fi
remote_cmd="tail -n ${tail_lines} '${remote_log_file}'"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -n "$tail_lines" ]]; then | |
| remote_cmd="tail -n ${tail_lines} '${remote_log_file}'" | |
| fi | |
| if [[ -n "$tail_lines" ]]; then | |
| if ! [[ "$tail_lines" =~ ^[0-9]+$ ]]; then | |
| _log_error "--tail requires a positive integer, got: '$tail_lines'" | |
| return 1 | |
| fi | |
| remote_cmd="tail -n ${tail_lines} '${remote_log_file}'" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/remote-dispatch-helper.sh around lines 797 - 799, The
tail_lines value is interpolated into remote_cmd without validation, allowing
shell injection; update the block that sets remote_cmd (using variables
tail_lines and remote_log_file) to first validate tail_lines is a non-negative
integer (e.g. with a regex test like ^[0-9]+$ or by converting and checking
numeric) and only build remote_cmd when the check passes; if validation fails,
either unset tail_lines or error out early so no untrusted content is inserted
into the SSH-executed command.
| # t1165.3: Handle remote dispatch PID format "remote:host:pid" | ||
| if [[ "$pid" == remote:* ]]; then | ||
| local _remote_host _remote_pid | ||
| _remote_host=$(echo "$pid" | cut -d: -f2) | ||
| _remote_pid=$(echo "$pid" | cut -d: -f3) | ||
| local remote_helper="${SCRIPT_DIR}/../remote-dispatch-helper.sh" | ||
| if [[ -x "$remote_helper" ]]; then | ||
| if "$remote_helper" status "$tid" "$_remote_host" >/dev/null 2>&1; then | ||
| is_alive=true | ||
| else | ||
| # Remote worker finished — collect logs before evaluation | ||
| log_info " $tid: remote worker finished on $_remote_host, collecting logs..." | ||
| local collected_log | ||
| collected_log=$("$remote_helper" logs "$tid" "$_remote_host" 2>/dev/null) || collected_log="" | ||
| if [[ -n "$collected_log" && -f "$collected_log" ]]; then | ||
| # Update the task's log_file to point to the collected local copy | ||
| db "$SUPERVISOR_DB" "UPDATE tasks SET log_file = '$(sql_escape "$collected_log")' WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || true | ||
| log_info " $tid: remote logs collected to $collected_log" | ||
| fi | ||
| fi | ||
| else | ||
| log_warn " $tid: remote-dispatch-helper.sh not found, cannot check remote worker" | ||
| fi | ||
| elif kill -0 "$pid" 2>/dev/null; then | ||
| is_alive=true | ||
| fi |
There was a problem hiding this comment.
Remote PID guard is incomplete — Phase 0.7, Phase 1c, and Phase 4 will misidentify remote workers as dead
The remote:* detection was added only here. Three other phases read PID files and call kill -0 without a matching guard — kill -0 "remote:host:pid" always silently fails because the argument is not numeric:
| Phase | Location | Consequence |
|---|---|---|
| Phase 0.7 | ~line 991 | stale_has_live_worker stays false → remote task re-queued or failed as stale |
| Phase 1c | ~line 1691 | stuck_alive stays false → remote task reaped from evaluating prematurely |
| Phase 4 health | ~line 2491 | rm -f "$pid_file" fires, then cmd_evaluate/cmd_transition("failed") runs against a still-running remote worker |
| live_pid_count | ~line 2821 | Always 0 for remote workers → spurious STATE MISMATCH warnings every pulse |
Phase 4 is the most destructive path: it deletes the remote:host:pid PID file (breaking future status checks for that task) and then tries to evaluate a task whose worker is still alive on the remote host.
Each of those PID-liveness checks should extract the stored PID, detect the remote:* prefix, and delegate to remote-dispatch-helper.sh (or at minimum skip kill -0) the same way this block does.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/pulse.sh around lines 1210 - 1235, The handling
of "remote:host:pid" is only implemented in one block; replicate that guard in
every place that calls kill -0 or treats PID files as numeric — specifically
update the logic around stale_has_live_worker (Phase 0.7), stuck_alive (Phase
1c), the Phase 4 health check that runs rm -f "$pid_file" and then
cmd_evaluate/cmd_transition("failed"), and the live_pid_count computation —
detect if pid starts with "remote:", parse host and remote pid like the existing
block (use SCRIPT_DIR/../remote-dispatch-helper.sh), call the helper with status
to decide liveness (and collect logs via logs when transitioning), set the same
is_alive/stale_has_live_worker/stuck_alive/live_pid_count behavior as local
PIDs, and skip rm -f and local kill -0 for remote entries so remote workers are
not misidentified as dead.
| _remote_host=$(echo "$pid" | cut -d: -f2) | ||
| _remote_pid=$(echo "$pid" | cut -d: -f3) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Locate and read the pulse.sh file around lines 1213-1214
fd -t f 'pulse.sh' -x cat -n {} | sed -n '1200,1230p'Repository: marcusquinn/aidevops
Length of output: 1722
🏁 Script executed:
# Find and examine the remote-dispatch-helper.sh file
fd -t f 'remote-dispatch-helper' -x cat -n {}Repository: marcusquinn/aidevops
Length of output: 39382
🏁 Script executed:
# Check for any usage of _remote_pid after line 1214 in pulse.sh
fd -t f 'pulse.sh' -x cat -n {} | sed -n '1214,1300p'Repository: marcusquinn/aidevops
Length of output: 3689
🏁 Script executed:
# Verify if _remote_pid is used anywhere downstream in pulse.sh after line 1214
fd -t f 'pulse.sh' -x grep -n "_remote_pid" {}Repository: marcusquinn/aidevops
Length of output: 155
Remove unused _remote_pid variable — dead code
_remote_pid (line 1214) is extracted from the remote:host:pid string but never passed to the helper. The status and logs subcommands accept only <task_id> and <host> arguments; they retrieve the remote PID internally from metadata using the task ID. Delete line 1214 and simplify the variable declaration on line 1212 to local _remote_host.
Also ensure this script runs through ShellCheck (-u flag) as part of pre-commit checks per coding guidelines for .agents/scripts/**/*.sh.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/pulse.sh around lines 1213 - 1214, Remove the
dead _remote_pid extraction: change the local declaration that extracts
remote:host:pid to only assign the host (use local _remote_host) and delete the
line creating _remote_pid; update any nearby references to only use _remote_host
when calling the helper for the status/logs subcommands (these accept <task_id>
and <host> only and obtain the PID from metadata). After editing, run ShellCheck
(-u) on .agents/scripts/supervisor/pulse.sh (and ensure pre-commit will run
ShellCheck for .agents/scripts/**/*.sh) to satisfy lint rules.
| else | ||
| log_warn " $tid: remote-dispatch-helper.sh not found, cannot check remote worker" | ||
| fi |
There was a problem hiding this comment.
is_alive defaults to false when helper is unavailable — premature task evaluation on infra failures
When remote-dispatch-helper.sh is not executable the warning is emitted but is_alive stays false, so the task is immediately pushed through the evaluation path as if its worker died. A transient permission change, deployment hiccup, or filesystem issue will cascade into incorrect retrying/failed transitions for every remote task.
The conservative default for an indeterminate remote state is to treat the worker as alive and wait for the next pulse to retry:
🛡️ Proposed fix — treat remote worker as alive when helper is unavailable
- else
- log_warn " $tid: remote-dispatch-helper.sh not found, cannot check remote worker"
- fi
+ else
+ log_warn " $tid: remote-dispatch-helper.sh not found or not executable — treating remote worker as alive until helper is available"
+ is_alive=true
+ fiThe AI summary states the code "falls back to the existing local PID check logic" when the helper is missing. It does not — the
elif kill -0 "$pid"branch is only reachable for non-remote:*PIDs; theelseclause inside theremote:*block has no fallback.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else | |
| log_warn " $tid: remote-dispatch-helper.sh not found, cannot check remote worker" | |
| fi | |
| else | |
| log_warn " $tid: remote-dispatch-helper.sh not found or not executable — treating remote worker as alive until helper is available" | |
| is_alive=true | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/pulse.sh around lines 1230 - 1232, The current
logic leaves is_alive=false when remote-dispatch-helper.sh is not executable,
causing remote tasks to be treated as dead; update the remote task branch so
that when the helper cannot be invoked (the branch that currently calls log_warn
" $tid: remote-dispatch-helper.sh not found, cannot check remote worker") you
set is_alive=true (conservative default) and skip transitioning the task to
retry/failed; only run the existing remote health-check flow when the helper
executes successfully, and keep the existing local PID check (the elif kill -0
"$pid" branch) unchanged for non-remote PIDs.
| **Security notes**: | ||
|
|
||
| - SSH agent forwarding (`-A`) passes the local SSH agent socket, not the keys themselves | ||
| - API keys are passed as environment variables to the remote command | ||
| - Keys are NOT written to disk on the remote host | ||
| - The remote workspace is cleaned up after task completion |
There was a problem hiding this comment.
Security notes should mention /proc/<pid>/environ exposure risk for environment-variable-based credential forwarding.
Line 149 states "Keys are NOT written to disk on the remote host," but environment variables are readable from /proc/<pid>/environ by the same user (and root) on Linux. While acceptable for most use cases, this is worth a brief mention for security-conscious deployments.
Additionally, if the remote SSH session's AcceptEnv or SendEnv isn't configured, the env-var forwarding approach may silently fail — the helper likely uses a different mechanism (e.g., inline VAR=val command), but the docs should clarify the actual transport method.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/tools/containers/remote-dispatch.md around lines 146 - 151, Update
the "Security notes" section to explicitly mention that
environment-variable-based credential forwarding exposes values to
/proc/<pid>/environ (readable by the same user and root on Linux) so keys aren't
written to disk but can still be read by processes with appropriate privileges,
and add a brief caveat for security-conscious deployments; also clarify the
exact transport mechanism used by the helper (whether it relies on SSH
AcceptEnv/SendEnv or uses inline exported variables like VAR=val command) so
readers know if SSH server/client env propagation can silently fail.
| The remote host needs `opencode` or `claude` CLI installed. Install via: | ||
|
|
||
| ```bash | ||
| # On the remote host | ||
| npm install -g @anthropic-ai/claude-code | ||
| # or | ||
| curl -fsSL https://opencode.ai/install | bash | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
opencode CLI install command 2025
💡 Result:
To install OpenCode (opencode) CLI, use:
curl -fsSL https://opencode.ai/install | bash[1][2]
Alternative (Node.js):
npm install -g opencode-ai[2]
macOS/Linux (Homebrew):
brew install opencode[3]
Sources: [1] [2] [3]
Update npm package name for opencode CLI.
Line 263 uses @anthropic-ai/claude-code, but the current npm package is opencode-ai. Update to:
npm install -g opencode-aiThe curl-based install at line 269 is correct and current.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/tools/containers/remote-dispatch.md around lines 262 - 269, Update
the npm install line to use the current package name: replace the existing `npm
install -g `@anthropic-ai/claude-code`` invocation with `npm install -g
opencode-ai` so the remote host installs the correct opencode CLI; leave the
curl-based installer (`curl -fsSL https://opencode.ai/install | bash`)
unchanged.
Auto-dismissed: bot review does not block autonomous pipeline



Summary
Remote container support — dispatch AI workers to containers on remote hosts via SSH/Tailscale, with credential forwarding and log collection.
Ref #1765
Changes
New:
remote-dispatch-helper.sh(1057 lines)add,remove,hosts,checkcommands for configuring remote hosts*.ts.net,100.x.x.x), usestailscale sshwhen available, falls back to regular SSH-A) for git auth, API key env vars (GH_TOKEN,ANTHROPIC_API_KEY,OPENROUTER_API_KEY,GOOGLE_API_KEY)nohup/setsid(ordocker execfor container targets)--follow) and batch collection from remote hostsModified:
supervisor/dispatch.shcmd_dispatch(): Detectsdispatch_targetcolumn on tasks, routes toremote-dispatch-helper.shinstead of local process spawncmd_worker_status(): Handlesremote:host:pidPID format for remote workersModified:
supervisor/pulse.shlog_fileto local copy for normal evaluation flowModified:
supervisor/database.shdispatch_target TEXT DEFAULT NULLcolumn on tasks tableCREATE TABLE tasksNew:
tools/containers/remote-dispatch.mdsubagent-index.toon,AGENTS.mddomain index,headless-dispatch.mdcross-referenceArchitecture
Verification
.shfilesSummary by CodeRabbit
Release Notes
New Features
Documentation