Skip to content

t1165.4: Integration test — multi-container batch dispatch#2111

Merged
marcusquinn merged 1 commit intomainfrom
feature/t1165.4
Feb 21, 2026
Merged

t1165.4: Integration test — multi-container batch dispatch#2111
marcusquinn merged 1 commit intomainfrom
feature/t1165.4

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 21, 2026

Summary

  • Add comprehensive integration test for multi-container batch dispatch (t1165.4)
  • Verifies parallel workers, OAuth routing, container lifecycle, log aggregation, failure isolation, adaptive concurrency
  • 70 test assertions covering the full supervisor batch dispatch pipeline
  • All tests pass with 0 failures, 0 skips

Test Coverage

Area Tests
Batch creation 5
OAuth routing 6
Parallel dispatch 6
Log aggregation 12
Worker evaluation 4
Failure isolation 5
Second wave backfill 2
Batch completion 2
Cross-worker logs 4
Cleanup 3
State audit trail 7
Adaptive concurrency 4
Edge cases 4
Quality checks 3
Total 70

Ref #1766

Summary by CodeRabbit

  • Tests
    • Added comprehensive integration test suite for multi-container batch dispatch workflow, including validation of task queueing, worker dispatch, OAuth routing, logging, lifecycle transitions, error handling, and batch completion detection.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

Walkthrough

A comprehensive Bash integration test script exercising end-to-end multi-container batch dispatch workflows, including task registration, parallel worker dispatch across containers, OAuth routing logic, lifecycle state transitions, logging isolation, failure recovery with backfill, and extensive edge case validation.

Changes

Cohort / File(s) Summary
Integration Test Suite
tests/test-multi-container-batch-dispatch.sh
New 1161-line test script validating multi-container batch dispatch orchestration: supervisor DB initialization, mock CLI tools (claude/opencode), task/batch registration, OAuth routing patterns, parallel worker dispatch with concurrency enforcement, lifecycle transitions (queued → dispatched → running → evaluating → complete), worker logging isolation with sentinels and heartbeats, mixed evaluation outcomes (success vs. rate-limit retries), failure isolation, backfill waves, batch completion detection, state-machine validation, and edge cases (empty batches, duplicates, release-on-complete behavior).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🚀 Batch dispatchers dance in tests so grand,
Workers queue up, container-bound and planned,
OAuth routes them left and right with flair,
Logs stay isolated in the digital air,
Edge cases tamed—automation's finest prayer! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the primary change: adding an integration test for multi-container batch dispatch functionality, with a ticket reference (t1165.4) for traceability.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1165.4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcusquinn marcusquinn marked this pull request as ready for review February 21, 2026 16:22
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical integration test designed to thoroughly validate the supervisor's behavior in a multi-container batch dispatch environment. It ensures that the system correctly handles parallel task execution, routes OAuth credentials appropriately, manages container lifecycles, aggregates logs effectively, isolates failures, and scales concurrency adaptively. This new test significantly enhances the reliability and stability of the batch processing pipeline.

Highlights

  • New Integration Test: Added a comprehensive integration test for multi-container batch dispatch (t1165.4) to validate the supervisor's orchestration logic.
  • Key Verifications: The test verifies parallel workers, correct OAuth routing, container lifecycle management, log aggregation, failure isolation, and adaptive concurrency.
  • Extensive Coverage: Includes 70 test assertions covering the full supervisor batch dispatch pipeline, ensuring robust functionality.
  • Test Status: All tests pass with 0 failures and 0 skips, indicating successful implementation.
Changelog
  • tests/test-multi-container-batch-dispatch.sh
    • Added a new comprehensive integration test script for multi-container batch dispatch.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 38 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 16:23:09 UTC 2026: Code review monitoring started
Sat Feb 21 16:23:09 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 38

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 38
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 16:23:12 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive integration test for multi-container batch dispatch. The test script is well-structured and covers a wide range of scenarios, including parallel execution, OAuth routing, log aggregation, and failure isolation. My review focuses on adherence to the repository's shell scripting style guide and best practices. I've identified several areas for improvement, primarily related to explicit return statements, error handling, and output redirection, to enhance the script's robustness and maintainability. Overall, this is a valuable addition that significantly improves test coverage for the supervisor's batch processing capabilities.

export MOCK_CLAUDE_LOG="$TEST_DIR/mock-claude-invocations.log"
export MOCK_OPENCODE_LOG="$TEST_DIR/mock-opencode-invocations.log"

# shellcheck disable=SC2317,SC2329

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repository style guide (line 14) requires a reason for disabling ShellCheck warnings. Please add a comment explaining why SC2317 and SC2329 are disabled for the cleanup function.

Suggested change
# shellcheck disable=SC2317,SC2329
# shellcheck disable=SC2317,SC2329 # SC2317: The trap is for EXIT, so commands after it are unreachable. SC2329: The trap is not meant to be inherited by subshells.
References
  1. ShellCheck violations require targeted inline disables with reason comments. (link)

Comment on lines +146 to +183
test_db() {
sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@"
}

# Helper: get task status
get_status() {
test_db "SELECT status FROM tasks WHERE id = '$1';"
}

# Helper: get task field
get_field() {
test_db "SELECT $2 FROM tasks WHERE id = '$1';"
}

# Helper: create a mock worker log file
create_log() {
local task_id="$1"
local content="$2"
local log_file="$TEST_DIR/supervisor/logs/${task_id}.log"
mkdir -p "$TEST_DIR/supervisor/logs"
echo "$content" >"$log_file"
test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';"
echo "$log_file"
}

# Helper: create a mock PID file for a running worker
create_pid_file() {
local task_id="$1"
local pid="$2"
mkdir -p "$TEST_DIR/supervisor/pids"
echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid"
}

# Helper: count tasks in a given status
count_status() {
local status="$1"
test_db "SELECT count(*) FROM tasks WHERE status = '$status';"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Several helper functions are missing explicit return statements, which is required by the repository's style guide (line 12). While the script may function correctly with set -e, adding explicit returns improves clarity and adherence to the coding standards.

For functions that wrap another command and should propagate its exit code, use return $?. For functions that are primarily used for their standard output and are expected to succeed, use return 0.

test_db() {
	sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@"
	return $?
}

# Helper: get task status
get_status() {
	test_db "SELECT status FROM tasks WHERE id = '$1';"
	return $?
}

# Helper: get task field
get_field() {
	test_db "SELECT $2 FROM tasks WHERE id = '$1';"
	return $?
}

# Helper: create a mock worker log file
create_log() {
	local task_id="$1"
	local content="$2"
	local log_file="$TEST_DIR/supervisor/logs/${task_id}.log"
	mkdir -p "$TEST_DIR/supervisor/logs"
	echo "$content" >"$log_file"
	test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';"
	echo "$log_file"
	return 0
}

# Helper: create a mock PID file for a running worker
create_pid_file() {
	local task_id="$1"
	local pid="$2"
	mkdir -p "$TEST_DIR/supervisor/pids"
	echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid"
	return $?
}

# Helper: count tasks in a given status
count_status() {
	local status="$1"
	test_db "SELECT count(*) FROM tasks WHERE status = '$status';"
	return $?
}
References
  1. All functions must have explicit return statements. (link)
  2. All functions in shell scripts must have explicit return statements to ensure predictable behavior and exit code propagation.

export HOME=\$mock_home
mkdir -p "\$mock_home/.claude"
echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json"
rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The -f flag for rm already suppresses errors for non-existent files, so 2>/dev/null is redundant here. It can also hide other errors like permission issues, which goes against the repository's general rules. Please remove 2>/dev/null from this line and similar occurrences on lines 357, 388, 419, 453, and 484.

Suggested change
rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null
rm -f '$TEST_DIR/supervisor/health/claude-oauth'
References
  1. Avoid using 2>/dev/null to suppress errors on file operations if the file's existence has already been verified by a preceding check (e.g., [[ -f "$file" ]] or an early return). This practice is redundant for 'file not found' errors and can mask other important issues like permissions problems.
  2. Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

# Verify WRAPPER_STARTED sentinel in all logs
for tid in mc-t1 mc-t2 mc-t3; do
log_file=$(get_field "$tid" "log_file")
if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors, such as file permission issues, making debugging harder. It's better to let potential errors be visible. Please remove 2>/dev/null from this line and similar grep calls on lines 610, 621, 880, and 894.

Suggested change
if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then
if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file"; then
References
  1. Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

section "Cross-Cutting Quality"

# Test: supervisor-helper.sh passes bash syntax check
if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr for syntax checks with 2>/dev/null hides the actual error messages, making it difficult to debug test failures. Please remove 2>/dev/null to allow error messages to be printed. This also applies to the checks on lines 1122 and 1135.

Suggested change
if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then
if bash -n "$SUPERVISOR_SCRIPT"; then
References
  1. Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comprehensive integration test verifying:
- Parallel workers across containers (batch concurrency enforcement)
- Correct OAuth routing (claude for Anthropic, opencode for non-Anthropic)
- Container lifecycle (dispatch -> running -> evaluate -> cleanup)
- Log aggregation across parallel workers (sentinels, per-task logs)
- Batch completion detection and post-completion hooks
- Adaptive concurrency scaling
- Cross-container failure isolation (one worker fails, others continue)
- Second dispatch wave backfill after completions
- Edge cases (empty batch, single-task, duplicates, release-on-complete)

70 test assertions covering the full supervisor batch dispatch pipeline.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 38 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 17:34:02 UTC 2026: Code review monitoring started
Sat Feb 21 17:34:02 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 38

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 38
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 17:34:05 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

coderabbitai[bot]
coderabbitai bot previously requested changes Feb 21, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
tests/test-multi-container-batch-dispatch.sh (5)

186-194: _run_isolated_test swallows all stderr, making failures hard to diagnose.

Line 190 redirects all stderr to /dev/null. If a sourced module or the test body produces an unexpected error, it will be silently eaten. Consider routing stderr to a temp file and dumping it only on failure, or honoring the --verbose flag.

Sketch
-	bash "$test_script" 2>/dev/null
+	local stderr_file="$TEST_DIR/isolated-stderr-$$.log"
+	bash "$test_script" 2>"$stderr_file"
 	local rc=$?
+	if [[ $rc -ne 0 && "$VERBOSE" == "--verbose" ]]; then
+		cat "$stderr_file" >&2
+	fi
+	rm -f "$stderr_file"
 	rm -f "$test_script"
 	return $rc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 186 - 194, The
helper _run_isolated_test currently discards all stderr by running bash
"$test_script" 2>/dev/null; change this to capture stderr to a temp file (e.g.,
"${test_script}.err") and only suppress output on success, printing the stderr
file on non-zero exit (or when a global verbose flag is set) so test failures
and sourced-module errors are visible; update the function to write stderr to
that temp file, check the return code from bash, and cat the stderr file on
failure or when --verbose is enabled before cleaning up.

308-494: Heavy duplication across six OAuth routing test blocks.

Each of the six _run_isolated_test heredocs (lines 309–329, 340–360, 371–391, 402–422, 436–456, 467–487) repeats ~18 identical setup lines, varying only SUPERVISOR_PREFER_OAUTH and the model argument to resolve_ai_cli. Extracting a helper would cut ~170 lines of near-identical code and make adding new routing test cases trivial.

Sketch: extract a reusable routing test helper
+# Helper: resolve CLI for a given model under specified OAuth preference
+resolve_cli_for_model() {
+    local model="$1"
+    local prefer_oauth="${2:-true}"
+    _run_isolated_test <<EOF
+#!/usr/bin/env bash
+set -euo pipefail
+export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
+export PATH='$MOCK_BIN':/usr/bin:/bin
+export SUPERVISOR_PREFER_OAUTH=$prefer_oauth
+unset SUPERVISOR_CLI 2>/dev/null || true
+BLUE='' GREEN='' YELLOW='' RED='' NC=''
+SUPERVISOR_LOG='/dev/null'
+SUPERVISOR_DIR='$TEST_DIR/supervisor'
+source '$SHARED_CONSTANTS'
+source '$SUPERVISOR_DIR_MODULE/_common.sh'
+source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
+mock_home=\$(mktemp -d)
+export HOME=\$mock_home
+mkdir -p "\$mock_home/.claude"
+echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json"
+rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null
+resolve_ai_cli '$model'
+rm -rf "\$mock_home"
+EOF
+}

Then each test collapses to:

oauth_opus_cli=$(resolve_cli_for_model 'anthropic/claude-opus-4-6' true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 308 - 494, The
tests duplicate the same ~18-line setup across multiple _run_isolated_test
heredocs around resolve_ai_cli; extract a helper function (e.g.,
resolve_cli_for_model) that accepts the model string and a
SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling
_run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup,
sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated
blocks with calls like oauth_opus_cli=$(resolve_cli_for_model
'anthropic/claude-opus-4-6' true) (ensure the helper toggles
SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the
resolve_ai_cli output).

140-143: Redundant return $?.

In Bash, a function implicitly returns the exit code of its last command. The explicit return $? on line 142 is a no-op. Not harmful, but removing it keeps the helper minimal.

Proposed fix
 sup() {
-	bash "$SUPERVISOR_SCRIPT" "$@" 2>&1
-	return $?
+	bash "$SUPERVISOR_SCRIPT" "$@" 2>&1
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 140 - 143, The
helper function sup currently ends with an unnecessary explicit "return $?"
which is redundant because Bash functions return the exit status of the last
command; remove the "return $?" line from the sup() function (which runs bash
"$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function simply invokes that command and
lets its exit status propagate implicitly.

1018-1022: Adaptive concurrency assertion only checks for "is a number", not the expected value.

The test calls calculate_adaptive_concurrency 3 2 6 but only verifies the result is a non-empty integer. It doesn't assert the expected output (e.g., 6 if the formula is min(base * factor, max)). This makes it a smoke test rather than a correctness test. Consider asserting the expected value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 1018 - 1022, The
test only verifies that adaptive_result is a non-empty integer; change it to
compute the expected value (e.g., expected = min(base * factor, max) for the
call calculate_adaptive_concurrency 3 2 6 → expected 6) and assert
adaptive_result equals that expected value instead of only matching a number;
update the conditional that currently checks "$adaptive_result" =~ ^[0-9]+$ to
compare numeric equality between adaptive_result and the calculated expected,
and produce the pass/skip messages referencing calculate_adaptive_concurrency
and adaptive_result accordingly.

88-102: Mock default log paths fall back to shared /tmp locations.

Lines 90 and 108 define fallback paths (/tmp/mock-claude-invocations.log, /tmp/mock-opencode-invocations.log) inside the heredoc. While MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are exported on lines 120–121 to override these defaults, the fallback creates a small risk: if the env var isn't inherited (e.g., a subshell env -i scenario), logs land in a shared /tmp file, potentially colliding across parallel CI runs or leaking test artifact details.

Consider pointing the fallback at a path that's guaranteed unique (e.g., derived from $$ or mktemp), or removing the default entirely and failing fast if the env var is unset.

Proposed fix
-MOCK_LOG="${MOCK_CLAUDE_LOG:-/tmp/mock-claude-invocations.log}"
+MOCK_LOG="${MOCK_CLAUDE_LOG:?MOCK_CLAUDE_LOG must be set}"
-MOCK_LOG="${MOCK_OPENCODE_LOG:-/tmp/mock-opencode-invocations.log}"
+MOCK_LOG="${MOCK_OPENCODE_LOG:?MOCK_OPENCODE_LOG must be set}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 88 - 102, The mock
scripts embed unsafe shared /tmp fallbacks for MOCK_CLAUDE_LOG and
MOCK_OPENCODE_LOG inside the heredocs which can collide in CI; update the
heredoc defaults to use unique per-process names (e.g., derive from $$ or use
mktemp) or remove the inline fallback entirely and make the script fail fast if
MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env vars are exported
later in the test harness), so that logs never silently land in a global /tmp
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 910-920: The PID file existence assertion is too loose—update the
check that iterates over tid in mc-t1..mc-t6 (the pid_count logic that tests
"$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of
">= 3"; change the conditional to compare against 6 and adjust the pass/fail
messages ("PID files exist for dispatched workers ($pid_count found)" and
"Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so
failures are clear.

---

Nitpick comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 186-194: The helper _run_isolated_test currently discards all
stderr by running bash "$test_script" 2>/dev/null; change this to capture stderr
to a temp file (e.g., "${test_script}.err") and only suppress output on success,
printing the stderr file on non-zero exit (or when a global verbose flag is set)
so test failures and sourced-module errors are visible; update the function to
write stderr to that temp file, check the return code from bash, and cat the
stderr file on failure or when --verbose is enabled before cleaning up.
- Around line 308-494: The tests duplicate the same ~18-line setup across
multiple _run_isolated_test heredocs around resolve_ai_cli; extract a helper
function (e.g., resolve_cli_for_model) that accepts the model string and a
SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling
_run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup,
sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated
blocks with calls like oauth_opus_cli=$(resolve_cli_for_model
'anthropic/claude-opus-4-6' true) (ensure the helper toggles
SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the
resolve_ai_cli output).
- Around line 140-143: The helper function sup currently ends with an
unnecessary explicit "return $?" which is redundant because Bash functions
return the exit status of the last command; remove the "return $?" line from the
sup() function (which runs bash "$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function
simply invokes that command and lets its exit status propagate implicitly.
- Around line 1018-1022: The test only verifies that adaptive_result is a
non-empty integer; change it to compute the expected value (e.g., expected =
min(base * factor, max) for the call calculate_adaptive_concurrency 3 2 6 →
expected 6) and assert adaptive_result equals that expected value instead of
only matching a number; update the conditional that currently checks
"$adaptive_result" =~ ^[0-9]+$ to compare numeric equality between
adaptive_result and the calculated expected, and produce the pass/skip messages
referencing calculate_adaptive_concurrency and adaptive_result accordingly.
- Around line 88-102: The mock scripts embed unsafe shared /tmp fallbacks for
MOCK_CLAUDE_LOG and MOCK_OPENCODE_LOG inside the heredocs which can collide in
CI; update the heredoc defaults to use unique per-process names (e.g., derive
from $$ or use mktemp) or remove the inline fallback entirely and make the
script fail fast if MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env
vars are exported later in the test harness), so that logs never silently land
in a global /tmp file.

Comment on lines +910 to +920
pid_count=0
for tid in mc-t1 mc-t2 mc-t3 mc-t4 mc-t5 mc-t6; do
if [[ -f "$TEST_DIR/supervisor/pids/${tid}.pid" ]]; then
pid_count=$((pid_count + 1))
fi
done
if [[ "$pid_count" -ge 3 ]]; then
pass "PID files exist for dispatched workers ($pid_count found)"
else
fail "Expected >= 3 PID files, got $pid_count"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Loose assertion: PID file count check uses >= 3 when all 6 should exist.

PID files were created for mc-t1..mc-t3 (line 528) and mc-t4..mc-t6 (line 767), so all 6 should be present before any cleanup runs. The >= 3 threshold could silently pass even if half the PID files were never written.

Proposed fix
-if [[ "$pid_count" -ge 3 ]]; then
-	pass "PID files exist for dispatched workers ($pid_count found)"
+if [[ "$pid_count" -eq 6 ]]; then
+	pass "PID files exist for all 6 dispatched workers"
 else
-	fail "Expected >= 3 PID files, got $pid_count"
+	fail "Expected 6 PID files, got $pid_count"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 910 - 920, The PID
file existence assertion is too loose—update the check that iterates over tid in
mc-t1..mc-t6 (the pid_count logic that tests
"$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of
">= 3"; change the conditional to compare against 6 and adjust the pass/fail
messages ("PID files exist for dispatched workers ($pid_count found)" and
"Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so
failures are clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant