Skip to content

t1164: End-to-end verification of dual-CLI architecture#2105

Merged
marcusquinn merged 1 commit intomainfrom
feature/t1164
Feb 21, 2026
Merged

t1164: End-to-end verification of dual-CLI architecture#2105
marcusquinn merged 1 commit intomainfrom
feature/t1164

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 21, 2026

WIP - End-to-end verification of dual-CLI architecture (t1164)

Incremental commits. Verifies: CLI routing, OAuth detection, auth failure fallback, cost tracking, no regressions.

Ref #1761

Summary by CodeRabbit

  • Tests

    • Added comprehensive end-to-end test suite validating dual CLI orchestration, routing logic, command construction, authentication flows, cost tracking, and error handling.
  • Chores

    • Updated deprecation messaging for CLI fallback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

Walkthrough

This PR updates logging semantics in the supervisor dispatch script and introduces a comprehensive end-to-end test suite that validates supervisor orchestration logic across CLI resolution, command construction, model routing, authentication flows, cost tracking, batch dispatch, escalation, and error handling scenarios.

Changes

Cohort / File(s) Summary
Dispatch Logging Update
.agents/scripts/supervisor/dispatch.sh
Minor log message revision: replaced log_warning with log_warn and updated fallback message text from "no opencode" to "deprecated" messaging. Control flow unchanged.
Comprehensive E2E Test Suite
tests/test-dual-cli-e2e.sh
New 1286-line test script with mock opencode/claude CLI binaries, isolated test environment, SQLite database validation, and 11+ helper functions (pass/fail/skip reporting, supervisor invocation, DB querying, log creation, environment isolation). Validates CLI resolution, command construction, model tier handling, OAuth flows, cost tracking, batch dispatch, escalation, dedup guards, health checks, and error scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Two CLIs dance beneath the supervisor's watchful eye,
One deprecated, one new—orchestrated with care,
Tests prove the dispatch flows true, no chance to fail,
Mocks capture every branch, every path,
Zero debt stands tall—A-grade DevOps craft. 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: an end-to-end verification test for the dual-CLI architecture, directly corresponding to the substantial new test file and minor bug fix in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1164

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcusquinn marcusquinn marked this pull request as ready for review February 21, 2026 10:13
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on robust end-to-end verification of the system's dual-CLI architecture. It introduces extensive testing to confirm that tasks are correctly routed to the appropriate CLI, models are resolved based on complexity and configuration, and various failure scenarios, particularly around authentication and cost, are handled gracefully. The changes ensure the stability and reliability of the dispatch mechanism without introducing regressions.

Highlights

  • Dual-CLI Architecture Verification: Introduced a new comprehensive end-to-end test suite to verify the dual-CLI architecture, ensuring proper routing between opencode (primary) and claude (deprecated fallback) CLIs.
  • CLI Routing and Command Generation: Verified that build_cli_cmd correctly generates commands for both opencode and claude CLIs, handling their distinct argument structures (e.g., run subcommand vs. -p flag, --format vs. --output-format, and provider prefix stripping).
  • Model Resolution and Tier Classification: Confirmed correct model resolution based on tier names (haiku, sonnet, opus) and full model strings, including the SUPERVISOR_MODEL environment variable override and tag-based task complexity classification.
  • Authentication and Cost Tracking: Validated OAuth detection, graceful fallback on authentication failures (e.g., permission denied, 401 Unauthorized, credits exhausted), and accurate recording of requested_tier and actual_tier for cost tracking.
  • Regression Prevention and Failure Handling: Ensured no regressions in the pure opencode dispatch path, and verified the functionality of the Dispatch Dedup Guard (t1206) and Prompt-Repeat Strategy (t1097) for handling task failures.
  • Minor Logging Refactor: Updated a deprecated log_warning function call to log_warn in the supervisor dispatch script for consistency.
Changelog
  • .agents/scripts/supervisor/dispatch.sh
    • Updated logging function call from log_warning to log_warn.
  • tests/test-dual-cli-e2e.sh
    • Added a new end-to-end test script for dual-CLI architecture verification.
    • Implemented mock binaries for opencode and claude to simulate CLI behavior.
    • Included tests for CLI resolution, command generation, model tier classification, and cost tracking.
    • Added tests for OAuth detection, authentication failure handling, and prompt repeat strategy.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a comprehensive end-to-end test suite for the dual-CLI architecture, which is a valuable addition for verifying routing and fallback logic. However, the new test script contains several violations of the repository's shell scripting standards, particularly regarding explicit return statements, variable declaration patterns, and blanket error suppression. Adhering to these standards ensures consistency and improves the debuggability of the test framework.

The deprecated claude CLI fallback path in resolve_ai_cli() called
log_warning() which does not exist — the correct function is log_warn().
Under set -euo pipefail, this caused the fallback to silently fail,
making the claude CLI unreachable even when installed.

Discovered during dual-CLI architecture verification testing.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 13:27:01 UTC 2026: Code review monitoring started
Sat Feb 21 13:27:01 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 34
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 13:27:04 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
tests/test-dual-cli-e2e.sh (4)

431-496: Missing test coverage for Claude CLI --mcp-config flag in build_cli_cmd.

Section 3 tests Claude's run, version, and probe commands but doesn't verify the --mcp-config / --strict-mcp-config flag injection (dispatch.sh lines 2198-2202). Since MCP worker isolation is a key part of the dual-CLI architecture (t1162), consider adding a test:

💡 Suggested test for MCP config flag
# Test: claude run with --mcp-config
cl_mcp_cmd=$(run_in_supervisor_env "
    build_cli_cmd --cli claude --action run --output array \
        --model 'anthropic/claude-sonnet-4-6' \
        --prompt 'Test prompt' \
        --mcp-config '/tmp/test-mcp.json'
")

if echo "$cl_mcp_cmd" | grep -q "\-\-mcp-config" && echo "$cl_mcp_cmd" | grep -q "\-\-strict-mcp-config"; then
    pass "Claude run: includes --mcp-config and --strict-mcp-config"
else
    fail "Claude run: should include MCP config flags" "Got: $cl_mcp_cmd"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 431 - 496, Tests for the Claude CLI
in section "3. build_cli_cmd for Claude" are missing coverage for injecting MCP
isolation flags; update the tests to call build_cli_cmd with --mcp-config (and a
sample path) and assert the produced command string includes both --mcp-config
and --strict-mcp-config as handled by the build_cli_cmd logic (see build_cli_cmd
and the dispatch.sh MCP injection). Add a new block similar to the suggested
snippet that captures cl_mcp_cmd via run_in_supervisor_env and uses
grep/conditionals to pass/fail based on presence of the two flags.

1138-1193: Dead test code: pr_auth is computed but never asserted.

Lines 1138–1172 compute pr_auth but no assertion references it. The actual assertion is on pr_reason (line 1189). Additionally, the || true on line 1170 means $? on line 1171 is always 0, making the exit code capture meaningless.

You can safely remove the pr_auth block (lines 1138–1172) since pr_reason already covers the same test:

🧹 Remove dead test code
 # Test: non-retryable failures are not eligible
-pr_auth=$(bash -c "
-    export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
-    export SUPERVISOR_PROMPT_REPEAT_ENABLED=true
-    BLUE='' GREEN='' YELLOW='' RED='' NC=''
-    SUPERVISOR_LOG='/dev/null'
-    SUPERVISOR_DIR='$TEST_DIR/supervisor'
-    SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db'
-    source '$SHARED_CONSTANTS'
-    source '$SUPERVISOR_DIR_MODULE/_common.sh'
-    source '$SUPERVISOR_DIR_MODULE/database.sh'
-    source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
-    should_prompt_repeat 'pr-t1' 'auth_error' || true
-    echo \"exit:\$?\"
-" 2>/dev/null | grep "^exit:" | head -1)
-
-# The function outputs the reason then returns 1 for non-retryable
 pr_reason=$(bash -c "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 1138 - 1193, Remove the dead pr_auth
test block (the bash subshell that sets pr_auth and the associated exit capture)
since pr_reason already validates non-retryable behavior; delete the unused
variable pr_auth and the redundant "|| true" that forces a zero exit status so
the exit code capture is meaningless, and ensure the test relies on
should_prompt_repeat via pr_reason (the existing assertion that greps for
"non_retryable") to validate auth_error behavior.

221-234: run_in_supervisor_env sources modules and evaluates $1 directly in a subshell.

Line 232 expands $1 directly inside bash -c, meaning arbitrary shell code can execute. This is by design for the test helper, but note that 2>/dev/null on line 233 swallows all stderr — including any unexpected error output from the sourced modules. If a test is mysteriously failing, this suppression can mask the root cause.

Consider adding a --verbose mode that conditionally preserves stderr:

💡 Optional: verbose-aware stderr handling
 run_in_supervisor_env() {
-	bash -c "
+	local stderr_dest="/dev/null"
+	[[ "$VERBOSE" == "--verbose" ]] && stderr_dest="/dev/stderr"
+	bash -c "
         export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
         export PATH='$MOCK_BIN:\$PATH'
         BLUE='' GREEN='' YELLOW='' RED='' NC=''
         SUPERVISOR_LOG='/dev/null'
         SUPERVISOR_DIR='$TEST_DIR/supervisor'
         SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db'
         source '$SHARED_CONSTANTS'
         source '$SUPERVISOR_DIR_MODULE/_common.sh'
         source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
         $1
-    " 2>/dev/null
+    " 2>"$stderr_dest"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 221 - 234, The helper
run_in_supervisor_env currently hard-suppresses stderr by appending
"2>/dev/null" to the bash -c invocation; change it to honor a verbose flag so
errors from the sourced modules (_common.sh, dispatch.sh) are visible when
needed: add a check for an environment variable (e.g., AIDEVOPS_VERBOSE) or
accept a second parameter and, if verbose is set, omit the "2>/dev/null"
redirection (or redirect stderr to a log file) otherwise keep the current
suppression; ensure default behavior remains suppressed and reference
run_in_supervisor_env and the sourced SUPERVISOR_DIR_MODULE files when making
the change.

833-843: Batch CLI routing test doesn't vary by model — validates default, not per-model routing.

The loop calls resolve_ai_cli without passing a model argument for each task, so it only confirms the default CLI is opencode — not that OAuth-aware routing would correctly handle Anthropic vs. non-Anthropic models. The test title says "all tasks route through the same CLI" which is accurate for the no-OAuth case, but consider adding a comment clarifying this is the "no OAuth configured" scenario.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-dual-cli-e2e.sh` around lines 833 - 843, The loop currently
invokes resolve_ai_cli with no model, so it only validates the default CLI;
update the loop to pass an explicit model for each task (map each task_id to the
intended model and call resolve_ai_cli "$model" inside run_in_supervisor_env) so
the test exercises per-model routing logic, referencing resolve_ai_cli,
run_in_supervisor_env and task_id; alternatively, if you intend to test the
no-OAuth default only, add a concise comment above the loop stating this is the
"no OAuth configured" scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test-dual-cli-e2e.sh`:
- Around line 431-496: Tests for the Claude CLI in section "3. build_cli_cmd for
Claude" are missing coverage for injecting MCP isolation flags; update the tests
to call build_cli_cmd with --mcp-config (and a sample path) and assert the
produced command string includes both --mcp-config and --strict-mcp-config as
handled by the build_cli_cmd logic (see build_cli_cmd and the dispatch.sh MCP
injection). Add a new block similar to the suggested snippet that captures
cl_mcp_cmd via run_in_supervisor_env and uses grep/conditionals to pass/fail
based on presence of the two flags.
- Around line 1138-1193: Remove the dead pr_auth test block (the bash subshell
that sets pr_auth and the associated exit capture) since pr_reason already
validates non-retryable behavior; delete the unused variable pr_auth and the
redundant "|| true" that forces a zero exit status so the exit code capture is
meaningless, and ensure the test relies on should_prompt_repeat via pr_reason
(the existing assertion that greps for "non_retryable") to validate auth_error
behavior.
- Around line 221-234: The helper run_in_supervisor_env currently
hard-suppresses stderr by appending "2>/dev/null" to the bash -c invocation;
change it to honor a verbose flag so errors from the sourced modules
(_common.sh, dispatch.sh) are visible when needed: add a check for an
environment variable (e.g., AIDEVOPS_VERBOSE) or accept a second parameter and,
if verbose is set, omit the "2>/dev/null" redirection (or redirect stderr to a
log file) otherwise keep the current suppression; ensure default behavior
remains suppressed and reference run_in_supervisor_env and the sourced
SUPERVISOR_DIR_MODULE files when making the change.
- Around line 833-843: The loop currently invokes resolve_ai_cli with no model,
so it only validates the default CLI; update the loop to pass an explicit model
for each task (map each task_id to the intended model and call resolve_ai_cli
"$model" inside run_in_supervisor_env) so the test exercises per-model routing
logic, referencing resolve_ai_cli, run_in_supervisor_env and task_id;
alternatively, if you intend to test the no-OAuth default only, add a concise
comment above the loop stating this is the "no OAuth configured" scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant