Description
When an MCP caller (e.g. Hep) spawns a session via create_session, and CC enters ask_question state (plan mode menu, clarification prompt, etc.), the question is forwarded to Telegram but never relayed back to the MCP caller. The caller has no visibility into the pending question, cannot respond, and eventually spawns a duplicate session for the same task.
Observed with issue #397: two sessions created (fix-397-tmux-crash-recovery and fix-397-tmux-crash-recovery-v2) 16 minutes apart. The first was stuck in ask_question for the entire time — CC was waiting for a human answer that never came.
Steps to reproduce
- Start Aegis (
npx aegis-bridge)
- Via MCP
create_session, spawn a session with a prompt that triggers CC plan mode (e.g. a complex refactoring task)
- CC analyzes the codebase and asks an interactive question with numbered options
- Observe:
get_status returns "status": "ask_question" but no question content
- The MCP caller has no way to see or answer the question
- Caller creates a second session for the same task
Expected behavior
get_status should include the full question content, options, and toolUseId so the caller can read and answer it via send_message
create_session should optionally wait for the first stable state and return pending questions inline
- Duplicate sessions for the same logical task should be preventable server-side
Root Cause Analysis
Three gaps in the current architecture:
Gap 1: Question content not exposed via MCP
monitor.ts L596 detects ask_question and emits status.question to Telegram (with buttons). But get_status in mcp-server.ts only returns the status string — no question content, no options, no toolUseId. The data exists in session.ts (pendingQuestions Map) but is never surfaced to MCP callers.
Gap 2: create_session is fire-and-forget
The MCP tool returns immediately with { id, status: "created" }. The caller must poll get_status to know what happened — and even then, cannot see question content (Gap 1). There is no way to await the first meaningful state.
Gap 3: No dedup mechanism
Nothing prevents creating multiple sessions for the same logical task. No idempotency key, no tag system, no server-side guard.
Proposed Fix — 3 interventions
1. Question Relay in get_status (minimal change)
Expose pendingQuestion from the existing pendingQuestions Map in session.ts:
{
"status": "ask_question",
"pendingQuestion": {
"toolUseId": "toolu_abc123",
"content": "Which recovery strategy? 1) Clean up 2) Auto-restart 3) Both",
"options": ["Clean up", "Auto-restart", "Both"],
"since": 1774906529600
}
}
When the caller responds via send_message, Aegis uses toolUseId to resolve the correct pending Promise.
2. create_session with optional waitForStable
New parameter: waitForStable: boolean (default true for MCP, false for REST backward-compat).
true: block until CC reaches working (30s+), idle, or ask_question, with configurable timeout
- If CC asks a question during wait → return immediately with
pendingQuestion in response
- If timeout → return with current status
false → current behavior (fire-and-forget)
Interventions 1 and 2 are complementary: wait covers the initial question, relay covers subsequent ones.
3. Idempotency key for dedup
New optional field on create_session:
{ "idempotencyKey": "issue:397", "workDir": "..." }
If an active (non-killed/dead) session with the same idempotencyKey exists → return that session instead of creating a new one. Deterministic, zero race conditions, zero agent cooperation required.
Tags remain as optional metadata for list_sessions filtering (organizational, not dedup).
What we explicitly do NOT do
- No auto-answer: CC asks → agent thinks → agent responds. Aegis is a bridge, not a brain.
- No forced naming conventions: idempotency keys and tags are opt-in.
- No breaking changes: REST API keeps current behavior unless new params are used.
Relevant logs
# Session 1 created at 23:30:45, stuck in ask_question
POST /v1/sessions → 331e0edd (fix-397-tmux-crash-recovery)
status.question emitted at 23:35 → forwarded to Telegram → no response
# Session 2 created at 23:47:04 (16 min later, same issue)
POST /v1/sessions → 99e68ee1 (fix-397-tmux-crash-recovery-v2)
Environment
- Aegis version: 2.2.5
- Node.js version: v22.22.1
- OS: Ubuntu 24.04.4 LTS
- Claude Code version: 2.1.87
- tmux version: 3.4
- Mode: MCP (stdio)
- Reproducible?: Always (any task that triggers CC plan mode questions via MCP)
Description
When an MCP caller (e.g. Hep) spawns a session via
create_session, and CC entersask_questionstate (plan mode menu, clarification prompt, etc.), the question is forwarded to Telegram but never relayed back to the MCP caller. The caller has no visibility into the pending question, cannot respond, and eventually spawns a duplicate session for the same task.Observed with issue #397: two sessions created (
fix-397-tmux-crash-recoveryandfix-397-tmux-crash-recovery-v2) 16 minutes apart. The first was stuck inask_questionfor the entire time — CC was waiting for a human answer that never came.Steps to reproduce
npx aegis-bridge)create_session, spawn a session with a prompt that triggers CC plan mode (e.g. a complex refactoring task)get_statusreturns"status": "ask_question"but no question contentExpected behavior
get_statusshould include the full question content, options, andtoolUseIdso the caller can read and answer it viasend_messagecreate_sessionshould optionally wait for the first stable state and return pending questions inlineRoot Cause Analysis
Three gaps in the current architecture:
Gap 1: Question content not exposed via MCP
monitor.tsL596 detectsask_questionand emitsstatus.questionto Telegram (with buttons). Butget_statusinmcp-server.tsonly returns the status string — no question content, no options, notoolUseId. The data exists insession.ts(pendingQuestionsMap) but is never surfaced to MCP callers.Gap 2:
create_sessionis fire-and-forgetThe MCP tool returns immediately with
{ id, status: "created" }. The caller must pollget_statusto know what happened — and even then, cannot see question content (Gap 1). There is no way to await the first meaningful state.Gap 3: No dedup mechanism
Nothing prevents creating multiple sessions for the same logical task. No idempotency key, no tag system, no server-side guard.
Proposed Fix — 3 interventions
1. Question Relay in
get_status(minimal change)Expose
pendingQuestionfrom the existingpendingQuestionsMap insession.ts:{ "status": "ask_question", "pendingQuestion": { "toolUseId": "toolu_abc123", "content": "Which recovery strategy? 1) Clean up 2) Auto-restart 3) Both", "options": ["Clean up", "Auto-restart", "Both"], "since": 1774906529600 } }When the caller responds via
send_message, Aegis usestoolUseIdto resolve the correct pending Promise.2.
create_sessionwith optionalwaitForStableNew parameter:
waitForStable: boolean(defaulttruefor MCP,falsefor REST backward-compat).true: block until CC reachesworking(30s+),idle, orask_question, with configurable timeoutpendingQuestionin responsefalse→ current behavior (fire-and-forget)Interventions 1 and 2 are complementary: wait covers the initial question, relay covers subsequent ones.
3. Idempotency key for dedup
New optional field on
create_session:{ "idempotencyKey": "issue:397", "workDir": "..." }If an active (non-killed/dead) session with the same
idempotencyKeyexists → return that session instead of creating a new one. Deterministic, zero race conditions, zero agent cooperation required.Tags remain as optional metadata for
list_sessionsfiltering (organizational, not dedup).What we explicitly do NOT do
Relevant logs
Environment