t1164: End-to-end verification of dual-CLI architecture#2105
t1164: End-to-end verification of dual-CLI architecture#2105marcusquinn merged 1 commit intomainfrom
Conversation
WalkthroughThis PR updates logging semantics in the supervisor dispatch script and introduces a comprehensive end-to-end test suite that validates supervisor orchestration logic across CLI resolution, command construction, model routing, authentication flows, cost tracking, batch dispatch, escalation, and error handling scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on robust end-to-end verification of the system's dual-CLI architecture. It introduces extensive testing to confirm that tasks are correctly routed to the appropriate CLI, models are resolved based on complexity and configuration, and various failure scenarios, particularly around authentication and cost, are handled gracefully. The changes ensure the stability and reliability of the dispatch mechanism without introducing regressions. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a comprehensive end-to-end test suite for the dual-CLI architecture, which is a valuable addition for verifying routing and fallback logic. However, the new test script contains several violations of the repository's shell scripting standards, particularly regarding explicit return statements, variable declaration patterns, and blanket error suppression. Adhering to these standards ensures consistency and improves the debuggability of the test framework.
The deprecated claude CLI fallback path in resolve_ai_cli() called log_warning() which does not exist — the correct function is log_warn(). Under set -euo pipefail, this caused the fallback to silently fail, making the claude CLI unreachable even when installed. Discovered during dual-CLI architecture verification testing.
c69416b to
148af93
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 13:27:04 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
🧹 Nitpick comments (4)
tests/test-dual-cli-e2e.sh (4)
431-496: Missing test coverage for Claude CLI--mcp-configflag inbuild_cli_cmd.Section 3 tests Claude's
run,version, andprobecommands but doesn't verify the--mcp-config/--strict-mcp-configflag injection (dispatch.sh lines 2198-2202). Since MCP worker isolation is a key part of the dual-CLI architecture (t1162), consider adding a test:💡 Suggested test for MCP config flag
# Test: claude run with --mcp-config cl_mcp_cmd=$(run_in_supervisor_env " build_cli_cmd --cli claude --action run --output array \ --model 'anthropic/claude-sonnet-4-6' \ --prompt 'Test prompt' \ --mcp-config '/tmp/test-mcp.json' ") if echo "$cl_mcp_cmd" | grep -q "\-\-mcp-config" && echo "$cl_mcp_cmd" | grep -q "\-\-strict-mcp-config"; then pass "Claude run: includes --mcp-config and --strict-mcp-config" else fail "Claude run: should include MCP config flags" "Got: $cl_mcp_cmd" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-dual-cli-e2e.sh` around lines 431 - 496, Tests for the Claude CLI in section "3. build_cli_cmd for Claude" are missing coverage for injecting MCP isolation flags; update the tests to call build_cli_cmd with --mcp-config (and a sample path) and assert the produced command string includes both --mcp-config and --strict-mcp-config as handled by the build_cli_cmd logic (see build_cli_cmd and the dispatch.sh MCP injection). Add a new block similar to the suggested snippet that captures cl_mcp_cmd via run_in_supervisor_env and uses grep/conditionals to pass/fail based on presence of the two flags.
1138-1193: Dead test code:pr_authis computed but never asserted.Lines 1138–1172 compute
pr_authbut no assertion references it. The actual assertion is onpr_reason(line 1189). Additionally, the|| trueon line 1170 means$?on line 1171 is always0, making the exit code capture meaningless.You can safely remove the
pr_authblock (lines 1138–1172) sincepr_reasonalready covers the same test:🧹 Remove dead test code
# Test: non-retryable failures are not eligible -pr_auth=$(bash -c " - export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor' - export SUPERVISOR_PROMPT_REPEAT_ENABLED=true - BLUE='' GREEN='' YELLOW='' RED='' NC='' - SUPERVISOR_LOG='/dev/null' - SUPERVISOR_DIR='$TEST_DIR/supervisor' - SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db' - source '$SHARED_CONSTANTS' - source '$SUPERVISOR_DIR_MODULE/_common.sh' - source '$SUPERVISOR_DIR_MODULE/database.sh' - source '$SUPERVISOR_DIR_MODULE/dispatch.sh' - should_prompt_repeat 'pr-t1' 'auth_error' || true - echo \"exit:\$?\" -" 2>/dev/null | grep "^exit:" | head -1) - -# The function outputs the reason then returns 1 for non-retryable pr_reason=$(bash -c "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-dual-cli-e2e.sh` around lines 1138 - 1193, Remove the dead pr_auth test block (the bash subshell that sets pr_auth and the associated exit capture) since pr_reason already validates non-retryable behavior; delete the unused variable pr_auth and the redundant "|| true" that forces a zero exit status so the exit code capture is meaningless, and ensure the test relies on should_prompt_repeat via pr_reason (the existing assertion that greps for "non_retryable") to validate auth_error behavior.
221-234:run_in_supervisor_envsources modules and evaluates$1directly in a subshell.Line 232 expands
$1directly insidebash -c, meaning arbitrary shell code can execute. This is by design for the test helper, but note that2>/dev/nullon line 233 swallows all stderr — including any unexpected error output from the sourced modules. If a test is mysteriously failing, this suppression can mask the root cause.Consider adding a
--verbosemode that conditionally preserves stderr:💡 Optional: verbose-aware stderr handling
run_in_supervisor_env() { - bash -c " + local stderr_dest="/dev/null" + [[ "$VERBOSE" == "--verbose" ]] && stderr_dest="/dev/stderr" + bash -c " export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor' export PATH='$MOCK_BIN:\$PATH' BLUE='' GREEN='' YELLOW='' RED='' NC='' SUPERVISOR_LOG='/dev/null' SUPERVISOR_DIR='$TEST_DIR/supervisor' SUPERVISOR_DB='$TEST_DIR/supervisor/supervisor.db' source '$SHARED_CONSTANTS' source '$SUPERVISOR_DIR_MODULE/_common.sh' source '$SUPERVISOR_DIR_MODULE/dispatch.sh' $1 - " 2>/dev/null + " 2>"$stderr_dest" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-dual-cli-e2e.sh` around lines 221 - 234, The helper run_in_supervisor_env currently hard-suppresses stderr by appending "2>/dev/null" to the bash -c invocation; change it to honor a verbose flag so errors from the sourced modules (_common.sh, dispatch.sh) are visible when needed: add a check for an environment variable (e.g., AIDEVOPS_VERBOSE) or accept a second parameter and, if verbose is set, omit the "2>/dev/null" redirection (or redirect stderr to a log file) otherwise keep the current suppression; ensure default behavior remains suppressed and reference run_in_supervisor_env and the sourced SUPERVISOR_DIR_MODULE files when making the change.
833-843: Batch CLI routing test doesn't vary by model — validates default, not per-model routing.The loop calls
resolve_ai_cliwithout passing a model argument for each task, so it only confirms the default CLI isopencode— not that OAuth-aware routing would correctly handle Anthropic vs. non-Anthropic models. The test title says "all tasks route through the same CLI" which is accurate for the no-OAuth case, but consider adding a comment clarifying this is the "no OAuth configured" scenario.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-dual-cli-e2e.sh` around lines 833 - 843, The loop currently invokes resolve_ai_cli with no model, so it only validates the default CLI; update the loop to pass an explicit model for each task (map each task_id to the intended model and call resolve_ai_cli "$model" inside run_in_supervisor_env) so the test exercises per-model routing logic, referencing resolve_ai_cli, run_in_supervisor_env and task_id; alternatively, if you intend to test the no-OAuth default only, add a concise comment above the loop stating this is the "no OAuth configured" scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test-dual-cli-e2e.sh`:
- Around line 431-496: Tests for the Claude CLI in section "3. build_cli_cmd for
Claude" are missing coverage for injecting MCP isolation flags; update the tests
to call build_cli_cmd with --mcp-config (and a sample path) and assert the
produced command string includes both --mcp-config and --strict-mcp-config as
handled by the build_cli_cmd logic (see build_cli_cmd and the dispatch.sh MCP
injection). Add a new block similar to the suggested snippet that captures
cl_mcp_cmd via run_in_supervisor_env and uses grep/conditionals to pass/fail
based on presence of the two flags.
- Around line 1138-1193: Remove the dead pr_auth test block (the bash subshell
that sets pr_auth and the associated exit capture) since pr_reason already
validates non-retryable behavior; delete the unused variable pr_auth and the
redundant "|| true" that forces a zero exit status so the exit code capture is
meaningless, and ensure the test relies on should_prompt_repeat via pr_reason
(the existing assertion that greps for "non_retryable") to validate auth_error
behavior.
- Around line 221-234: The helper run_in_supervisor_env currently
hard-suppresses stderr by appending "2>/dev/null" to the bash -c invocation;
change it to honor a verbose flag so errors from the sourced modules
(_common.sh, dispatch.sh) are visible when needed: add a check for an
environment variable (e.g., AIDEVOPS_VERBOSE) or accept a second parameter and,
if verbose is set, omit the "2>/dev/null" redirection (or redirect stderr to a
log file) otherwise keep the current suppression; ensure default behavior
remains suppressed and reference run_in_supervisor_env and the sourced
SUPERVISOR_DIR_MODULE files when making the change.
- Around line 833-843: The loop currently invokes resolve_ai_cli with no model,
so it only validates the default CLI; update the loop to pass an explicit model
for each task (map each task_id to the intended model and call resolve_ai_cli
"$model" inside run_in_supervisor_env) so the test exercises per-model routing
logic, referencing resolve_ai_cli, run_in_supervisor_env and task_id;
alternatively, if you intend to test the no-OAuth default only, add a concise
comment above the loop stating this is the "no OAuth configured" scenario.



WIP - End-to-end verification of dual-CLI architecture (t1164)
Incremental commits. Verifies: CLI routing, OAuth detection, auth failure fallback, cost tracking, no regressions.
Ref #1761
Summary by CodeRabbit
Tests
Chores