Skip to content

MCP callers never see CC questions — causes duplicate sessions #599

@OneStepAt4time

Description

@OneStepAt4time

Description

When an MCP caller (e.g. Hep) spawns a session via create_session, and CC enters ask_question state (plan mode menu, clarification prompt, etc.), the question is forwarded to Telegram but never relayed back to the MCP caller. The caller has no visibility into the pending question, cannot respond, and eventually spawns a duplicate session for the same task.

Observed with issue #397: two sessions created (fix-397-tmux-crash-recovery and fix-397-tmux-crash-recovery-v2) 16 minutes apart. The first was stuck in ask_question for the entire time — CC was waiting for a human answer that never came.

Steps to reproduce

  1. Start Aegis (npx aegis-bridge)
  2. Via MCP create_session, spawn a session with a prompt that triggers CC plan mode (e.g. a complex refactoring task)
  3. CC analyzes the codebase and asks an interactive question with numbered options
  4. Observe: get_status returns "status": "ask_question" but no question content
  5. The MCP caller has no way to see or answer the question
  6. Caller creates a second session for the same task

Expected behavior

  1. get_status should include the full question content, options, and toolUseId so the caller can read and answer it via send_message
  2. create_session should optionally wait for the first stable state and return pending questions inline
  3. Duplicate sessions for the same logical task should be preventable server-side

Root Cause Analysis

Three gaps in the current architecture:

Gap 1: Question content not exposed via MCP

monitor.ts L596 detects ask_question and emits status.question to Telegram (with buttons). But get_status in mcp-server.ts only returns the status string — no question content, no options, no toolUseId. The data exists in session.ts (pendingQuestions Map) but is never surfaced to MCP callers.

Gap 2: create_session is fire-and-forget

The MCP tool returns immediately with { id, status: "created" }. The caller must poll get_status to know what happened — and even then, cannot see question content (Gap 1). There is no way to await the first meaningful state.

Gap 3: No dedup mechanism

Nothing prevents creating multiple sessions for the same logical task. No idempotency key, no tag system, no server-side guard.

Proposed Fix — 3 interventions

1. Question Relay in get_status (minimal change)

Expose pendingQuestion from the existing pendingQuestions Map in session.ts:

{
  "status": "ask_question",
  "pendingQuestion": {
    "toolUseId": "toolu_abc123",
    "content": "Which recovery strategy? 1) Clean up 2) Auto-restart 3) Both",
    "options": ["Clean up", "Auto-restart", "Both"],
    "since": 1774906529600
  }
}

When the caller responds via send_message, Aegis uses toolUseId to resolve the correct pending Promise.

2. create_session with optional waitForStable

New parameter: waitForStable: boolean (default true for MCP, false for REST backward-compat).

  • true: block until CC reaches working (30s+), idle, or ask_question, with configurable timeout
  • If CC asks a question during wait → return immediately with pendingQuestion in response
  • If timeout → return with current status
  • false → current behavior (fire-and-forget)

Interventions 1 and 2 are complementary: wait covers the initial question, relay covers subsequent ones.

3. Idempotency key for dedup

New optional field on create_session:

{ "idempotencyKey": "issue:397", "workDir": "..." }

If an active (non-killed/dead) session with the same idempotencyKey exists → return that session instead of creating a new one. Deterministic, zero race conditions, zero agent cooperation required.

Tags remain as optional metadata for list_sessions filtering (organizational, not dedup).

What we explicitly do NOT do

  • No auto-answer: CC asks → agent thinks → agent responds. Aegis is a bridge, not a brain.
  • No forced naming conventions: idempotency keys and tags are opt-in.
  • No breaking changes: REST API keeps current behavior unless new params are used.

Relevant logs

# Session 1 created at 23:30:45, stuck in ask_question
POST /v1/sessions → 331e0edd (fix-397-tmux-crash-recovery)
status.question emitted at 23:35 → forwarded to Telegram → no response

# Session 2 created at 23:47:04 (16 min later, same issue)
POST /v1/sessions → 99e68ee1 (fix-397-tmux-crash-recovery-v2)

Environment

  • Aegis version: 2.2.5
  • Node.js version: v22.22.1
  • OS: Ubuntu 24.04.4 LTS
  • Claude Code version: 2.1.87
  • tmux version: 3.4
  • Mode: MCP (stdio)
  • Reproducible?: Always (any task that triggers CC plan mode questions via MCP)

Metadata

Metadata

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions