t1164: End-to-end verification of dual-CLI architecture by marcusquinn · Pull Request #2105 · marcusquinn/aidevops

marcusquinn · 2026-02-21T09:59:54Z

WIP - End-to-end verification of dual-CLI architecture (t1164)

Incremental commits. Verifies: CLI routing, OAuth detection, auth failure fallback, cost tracking, no regressions.

Summary by CodeRabbit

Tests
- Added comprehensive end-to-end test suite validating dual CLI orchestration, routing logic, command construction, authentication flows, cost tracking, and error handling.
Chores
- Updated deprecation messaging for CLI fallback.

coderabbitai · 2026-02-21T10:00:03Z

Walkthrough

This PR updates logging semantics in the supervisor dispatch script and introduces a comprehensive end-to-end test suite that validates supervisor orchestration logic across CLI resolution, command construction, model routing, authentication flows, cost tracking, batch dispatch, escalation, and error handling scenarios.

Changes

Cohort / File(s)	Summary
Dispatch Logging Update `.agents/scripts/supervisor/dispatch.sh`	Minor log message revision: replaced `log_warning` with `log_warn` and updated fallback message text from "no opencode" to "deprecated" messaging. Control flow unchanged.
Comprehensive E2E Test Suite `tests/test-dual-cli-e2e.sh`	New 1286-line test script with mock opencode/claude CLI binaries, isolated test environment, SQLite database validation, and 11+ helper functions (pass/fail/skip reporting, supervisor invocation, DB querying, log creation, environment isolation). Validates CLI resolution, command construction, model tier handling, OAuth flows, cost tracking, batch dispatch, escalation, dedup guards, health checks, and error scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Two CLIs dance beneath the supervisor's watchful eye,
One deprecated, one new—orchestrated with care,
Tests prove the dispatch flows true, no chance to fail,
Mocks capture every branch, every path,
Zero debt stands tall—A-grade DevOps craft. 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: an end-to-end verification test for the dual-CLI architecture, directly corresponding to the substantial new test file and minor bug fix in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1164

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-21T10:13:16Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on robust end-to-end verification of the system's dual-CLI architecture. It introduces extensive testing to confirm that tasks are correctly routed to the appropriate CLI, models are resolved based on complexity and configuration, and various failure scenarios, particularly around authentication and cost, are handled gracefully. The changes ensure the stability and reliability of the dispatch mechanism without introducing regressions.

Highlights

Dual-CLI Architecture Verification: Introduced a new comprehensive end-to-end test suite to verify the dual-CLI architecture, ensuring proper routing between opencode (primary) and claude (deprecated fallback) CLIs.
CLI Routing and Command Generation: Verified that build_cli_cmd correctly generates commands for both opencode and claude CLIs, handling their distinct argument structures (e.g., run subcommand vs. -p flag, --format vs. --output-format, and provider prefix stripping).
Model Resolution and Tier Classification: Confirmed correct model resolution based on tier names (haiku, sonnet, opus) and full model strings, including the SUPERVISOR_MODEL environment variable override and tag-based task complexity classification.
Authentication and Cost Tracking: Validated OAuth detection, graceful fallback on authentication failures (e.g., permission denied, 401 Unauthorized, credits exhausted), and accurate recording of requested_tier and actual_tier for cost tracking.
Regression Prevention and Failure Handling: Ensured no regressions in the pure opencode dispatch path, and verified the functionality of the Dispatch Dedup Guard (t1206) and Prompt-Repeat Strategy (t1097) for handling task failures.
Minor Logging Refactor: Updated a deprecated log_warning function call to log_warn in the supervisor dispatch script for consistency.

Changelog

.agents/scripts/supervisor/dispatch.sh
- Updated logging function call from log_warning to log_warn.
tests/test-dual-cli-e2e.sh
- Added a new end-to-end test script for dual-CLI architecture verification.
- Implemented mock binaries for opencode and claude to simulate CLI behavior.
- Included tests for CLI resolution, command generation, model tier classification, and cost tracking.
- Added tests for OAuth detection, authentication failure handling, and prompt repeat strategy.

Activity

The pull request is a Work In Progress (WIP) and includes incremental commits.
It aims for end-to-end verification of the dual-CLI architecture, referencing issue t1164: End-to-end verification of dual-CLI architecture #1761.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a comprehensive end-to-end test suite for the dual-CLI architecture, which is a valuable addition for verifying routing and fallback logic. However, the new test script contains several violations of the repository's shell scripting standards, particularly regarding explicit return statements, variable declaration patterns, and blanket error suppression. Adhering to these standards ensures consistency and improves the debuggability of the test framework.

tests/test-dual-cli-e2e.sh

The deprecated claude CLI fallback path in resolve_ai_cli() called log_warning() which does not exist — the correct function is log_warn(). Under set -euo pipefail, this caused the fallback to silently fail, making the claude CLI unreachable even when installed. Discovered during dual-CLI architecture verification testing.

github-actions · 2026-02-21T13:27:05Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 13:27:01 UTC 2026: Code review monitoring started
Sat Feb 21 13:27:01 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 34
VULNERABILITIES: 0

Generated on: Sat Feb 21 13:27:04 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-21T13:27:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

🧹 Nitpick comments (4)

tests/test-dual-cli-e2e.sh (4)

431-496: Missing test coverage for Claude CLI --mcp-config flag in build_cli_cmd.

Section 3 tests Claude's run, version, and probe commands but doesn't verify the --mcp-config / --strict-mcp-config flag injection (dispatch.sh lines 2198-2202). Since MCP worker isolation is a key part of the dual-CLI architecture (t1162), consider adding a test:

💡 Suggested test for MCP config flag

# Test: claude run with --mcp-config
cl_mcp_cmd=$(run_in_supervisor_env "
    build_cli_cmd --cli claude --action run --output array \
        --model 'anthropic/claude-sonnet-4-6' \
        --prompt 'Test prompt' \
        --mcp-config '/tmp/test-mcp.json'
")

if echo "$cl_mcp_cmd" | grep -q "\-\-mcp-config" && echo "$cl_mcp_cmd" | grep -q "\-\-strict-mcp-config"; then
    pass "Claude run: includes --mcp-config and --strict-mcp-config"
else
    fail "Claude run: should include MCP config flags" "Got: $cl_mcp_cmd"
fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 431 - 496, Tests for the Claude CLI
in section "3. build_cli_cmd for Claude" are missing coverage for injecting MCP
isolation flags; update the tests to call build_cli_cmd with --mcp-config (and a
sample path) and assert the produced command string includes both --mcp-config
and --strict-mcp-config as handled by the build_cli_cmd logic (see build_cli_cmd
and the dispatch.sh MCP injection). Add a new block similar to the suggested
snippet that captures cl_mcp_cmd via run_in_supervisor_env and uses
grep/conditionals to pass/fail based on presence of the two flags.

1138-1193: Dead test code: pr_auth is computed but never asserted.

Lines 1138–1172 compute pr_auth but no assertion references it. The actual assertion is on pr_reason (line 1189). Additionally, the || true on line 1170 means $? on line 1171 is always 0, making the exit code capture meaningless.

You can safely remove the pr_auth block (lines 1138–1172) since pr_reason already covers the same test:

🧹 Remove dead test code

 # Test: non-retryable failures are not eligible
-pr_auth=$(bash -c "
-    export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
-    export SUPERVISOR_PROMPT_REPEAT_ENABLED=true
-    BLUE='' GREEN='' YELLOW='' RED='' NC=''
-    SUPERVISOR_LOG='/dev/null'
-    SUPERVISOR_DIR='$TEST_DIR/supervisor'
-    SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db'
-    source '$SHARED_CONSTANTS'
-    source '$SUPERVISOR_DIR_MODULE/_common.sh'
-    source '$SUPERVISOR_DIR_MODULE/database.sh'
-    source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
-    should_prompt_repeat 'pr-t1' 'auth_error' || true
-    echo \"exit:\$?\"
-" 2>/dev/null | grep "^exit:" | head -1)
-
-# The function outputs the reason then returns 1 for non-retryable
 pr_reason=$(bash -c "

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 1138 - 1193, Remove the dead pr_auth
test block (the bash subshell that sets pr_auth and the associated exit capture)
since pr_reason already validates non-retryable behavior; delete the unused
variable pr_auth and the redundant "|| true" that forces a zero exit status so
the exit code capture is meaningless, and ensure the test relies on
should_prompt_repeat via pr_reason (the existing assertion that greps for
"non_retryable") to validate auth_error behavior.

221-234: run_in_supervisor_env sources modules and evaluates $1 directly in a subshell.

Line 232 expands $1 directly inside bash -c, meaning arbitrary shell code can execute. This is by design for the test helper, but note that 2>/dev/null on line 233 swallows all stderr — including any unexpected error output from the sourced modules. If a test is mysteriously failing, this suppression can mask the root cause.

Consider adding a --verbose mode that conditionally preserves stderr:

💡 Optional: verbose-aware stderr handling

 run_in_supervisor_env() {
-	bash -c "
+	local stderr_dest="/dev/null"
+	[[ "$VERBOSE" == "--verbose" ]] && stderr_dest="/dev/stderr"
+	bash -c "
         export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
         export PATH='$MOCK_BIN:\$PATH'
         BLUE='' GREEN='' YELLOW='' RED='' NC=''
         SUPERVISOR_LOG='/dev/null'
         SUPERVISOR_DIR='$TEST_DIR/supervisor'
         SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db'
         source '$SHARED_CONSTANTS'
         source '$SUPERVISOR_DIR_MODULE/_common.sh'
         source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
         $1
-    " 2>/dev/null
+    " 2>"$stderr_dest"
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 221 - 234, The helper
run_in_supervisor_env currently hard-suppresses stderr by appending
"2>/dev/null" to the bash -c invocation; change it to honor a verbose flag so
errors from the sourced modules (_common.sh, dispatch.sh) are visible when
needed: add a check for an environment variable (e.g., AIDEVOPS_VERBOSE) or
accept a second parameter and, if verbose is set, omit the "2>/dev/null"
redirection (or redirect stderr to a log file) otherwise keep the current
suppression; ensure default behavior remains suppressed and reference
run_in_supervisor_env and the sourced SUPERVISOR_DIR_MODULE files when making
the change.

833-843: Batch CLI routing test doesn't vary by model — validates default, not per-model routing.

The loop calls resolve_ai_cli without passing a model argument for each task, so it only confirms the default CLI is opencode — not that OAuth-aware routing would correctly handle Anthropic vs. non-Anthropic models. The test title says "all tasks route through the same CLI" which is accurate for the no-OAuth case, but consider adding a comment clarifying this is the "no OAuth configured" scenario.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 833 - 843, The loop currently
invokes resolve_ai_cli with no model, so it only validates the default CLI;
update the loop to pass an explicit model for each task (map each task_id to the
intended model and call resolve_ai_cli "$model" inside run_in_supervisor_env) so
the test exercises per-model routing logic, referencing resolve_ai_cli,
run_in_supervisor_env and task_id; alternatively, if you intend to test the
no-OAuth default only, add a concise comment above the loop stating this is the
"no OAuth configured" scenario.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test-dual-cli-e2e.sh`:
- Around line 431-496: Tests for the Claude CLI in section "3. build_cli_cmd for
Claude" are missing coverage for injecting MCP isolation flags; update the tests
to call build_cli_cmd with --mcp-config (and a sample path) and assert the
produced command string includes both --mcp-config and --strict-mcp-config as
handled by the build_cli_cmd logic (see build_cli_cmd and the dispatch.sh MCP
injection). Add a new block similar to the suggested snippet that captures
cl_mcp_cmd via run_in_supervisor_env and uses grep/conditionals to pass/fail
based on presence of the two flags.
- Around line 1138-1193: Remove the dead pr_auth test block (the bash subshell
that sets pr_auth and the associated exit capture) since pr_reason already
validates non-retryable behavior; delete the unused variable pr_auth and the
redundant "|| true" that forces a zero exit status so the exit code capture is
meaningless, and ensure the test relies on should_prompt_repeat via pr_reason
(the existing assertion that greps for "non_retryable") to validate auth_error
behavior.
- Around line 221-234: The helper run_in_supervisor_env currently
hard-suppresses stderr by appending "2>/dev/null" to the bash -c invocation;
change it to honor a verbose flag so errors from the sourced modules
(_common.sh, dispatch.sh) are visible when needed: add a check for an
environment variable (e.g., AIDEVOPS_VERBOSE) or accept a second parameter and,
if verbose is set, omit the "2>/dev/null" redirection (or redirect stderr to a
log file) otherwise keep the current suppression; ensure default behavior
remains suppressed and reference run_in_supervisor_env and the sourced
SUPERVISOR_DIR_MODULE files when making the change.
- Around line 833-843: The loop currently invokes resolve_ai_cli with no model,
so it only validates the default CLI; update the loop to pass an explicit model
for each task (map each task_id to the intended model and call resolve_ai_cli
"$model" inside run_in_supervisor_env) so the test exercises per-model routing
logic, referencing resolve_ai_cli, run_in_supervisor_env and task_id;
alternatively, if you intend to test the no-OAuth default only, add a concise
comment above the loop stating this is the "no OAuth configured" scenario.

marcusquinn marked this pull request as ready for review February 21, 2026 10:13

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

marcusquinn force-pushed the feature/t1164 branch from c69416b to 148af93 Compare February 21, 2026 13:26

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

coderabbitai bot approved these changes Feb 21, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 21, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

marcusquinn merged commit 2c49877 into main Feb 21, 2026
11 checks passed

marcusquinn deleted the feature/t1164 branch February 21, 2026 13:53

marcusquinn added a commit that referenced this pull request Feb 21, 2026

chore: mark t1164 complete in TODO.md (#2105)

01b7cff

marcusquinn mentioned this pull request Feb 21, 2026

t1164: End-to-end verification of dual-CLI architecture #1761

Closed

28 tasks

github-actions bot mentioned this pull request Feb 21, 2026

t1160: Dual-CLI Architecture: OpenCode primary + Claude Code CLI fallback #1746

Closed

51 tasks

This was referenced Feb 21, 2026

t1165.4: Integration test — multi-container batch dispatch #2111

Merged

t1165: Verify — containerized Claude Code CLI instances for multi-subscription scaling #2180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1164: End-to-end verification of dual-CLI architecture#2105

t1164: End-to-end verification of dual-CLI architecture#2105
marcusquinn merged 1 commit intomainfrom
feature/t1164

marcusquinn commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

sonarqubecloud bot commented Feb 21, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 21, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading