Skip to content

[BUG] Remote Control sessions die after ~20 min idle — server TTL ignores keepalives #32982

@sidkandan

Description

@sidkandan

Environment

  • Claude Code: v2.1.72
  • OS: macOS (Darwin 25.2.0)
  • Plan: Max
  • Reproduced on: interactive CLI sessions (/remote-control), auto-RC (remoteControlAtStartup: true), and agent sessions (--agent)

Description

All Remote Control sessions silently die after ~5-30 minutes of idle time (most commonly ~20 minutes). The phone app shows "Failed to send message — An unknown network error has occurred" while the local CLI still displays "Remote Control active." The session_ingress endpoint returns HTTP 404 — the session is deregistered server-side while the CLI process is alive, connected, and polling.

This affects every RC user who steps away from their keyboard — the core use case of "start at your desk, continue from your couch."

Root Cause (source-verified against cli.js v2.1.72)

We identified two independent bugs that together leave idle RC sessions unprotected:

Bug 1: Server-side session TTL does not reset on keepalive messages

The WebSocket transport sends {"type":"keep_alive"} data frames every 5 minutes via startKeepaliveInterval() (300,000ms interval, Swz=300000). The WebSocket ping/pong mechanism also runs every 10 seconds (hwz=10000).

Neither mechanism prevents server-side session deregistration. We verified this empirically:

  • Sessions with the 5-minute keepalive actively sending keep_alive frames still die at ~20 minutes
  • Sessions with 10-second WebSocket pings running continuously still die at ~20 minutes
  • The server-side session TTL appears to only reset on real user/model activity (actual messages through the bridge), not on transport-level keepalive frames

Note on CLAUDE_CODE_REMOTE: When set (e.g., in standalone claude remote-control bridge mode), CLAUDE_CODE_REMOTE disables the 5-minute keepalive entirely via an early return in startKeepaliveInterval(). However, even when the keepalive IS running (interactive /remote-control sessions), sessions still die — confirming the server ignores these frames for TTL purposes.

// startKeepaliveInterval() — 5-min keepalive (runs for interactive sessions)
startKeepaliveInterval() {
  this.stopKeepaliveInterval();
  if (t6(process.env.CLAUDE_CODE_REMOTE)) return;  // skipped in standalone bridge mode
  this.keepAliveInterval = setInterval(() => {
    this.ws.send(JSON.stringify({type: "keep_alive"}) + "\n");
  }, 300000);  // 5 minutes
}

Bug 2: SEND_KEEPALIVES replacement is broken by refcount gating

CLAUDE_CODE_REMOTE_SEND_KEEPALIVES was designed as an additional keepalive mechanism that sends {"type":"keep_alive"} every 30 seconds through the transport. However, the keepalive interval is gated by a refcount (C36) that tracks active model processing:

// Refcount increment — called when tool execution or streaming starts
function DD1() {
  C36++;
  if (C36 === 1) ie7();  // start 30s keepalive interval
}

// Refcount decrement — called when tool execution or streaming ends
function XD1() {
  C36--;
  if (C36 === 0) {
    if (Dd !== null) { clearInterval(Dd); Dd = null; }  // ← CLEARS the keepalive interval
    kE9();  // start idle timer (informational only)
  }
}

// The 30s keepalive interval
function ie7() {
  if (Dd !== null) { clearInterval(Dd); Dd = null; }  // clear previous
  if (mg6 !== null) { clearTimeout(mg6); mg6 = null; }  // clear idle timer
  Dd = setInterval(() => {
    if (t6(process.env.CLAUDE_CODE_REMOTE_SEND_KEEPALIVES))
      I36?.();  // sends {"type":"keep_alive"} via registered callback
  }, 30000);  // 30 seconds
}

Visual summary of all three keepalive paths:

[RC Keepalive Architecture — 3 mechanisms, all broken during idle]
Image

The flow:

  1. Model starts processing → DD1() increments C36 to 1 → ie7() starts the 30s keepalive interval
  2. Model finishes processing → XD1() decrements C36 to 0 → clearInterval(Dd) kills the keepalive
  3. Session is now idle with the SEND_KEEPALIVES mechanism stopped

The keepalive only runs while the model is actively processing — exactly when it's NOT needed. It stops during idle — exactly when sessions die.

We verified this empirically: sessions spawned with CLAUDE_CODE_REMOTE_SEND_KEEPALIVES=1 still died at ~25-30 minutes.

Bridge heartbeat — DISABLED server-side

// Server response from tengu_bridge_poll_interval_config:
{
  "poll_interval_ms_not_at_capacity": 2000,
  "poll_interval_ms_at_capacity": 600000,
  "heartbeat_interval_ms": 0
}

The bridge-level heartbeat infrastructure exists in the code (heartbeatWork() API call, poll loop heartbeat mode) but is server-disabled (heartbeat_interval_ms: 0).

Reproduction

  1. Start Claude Code: claude
  2. Enable RC: /remote-control
  3. Connect from Claude iOS/Android app
  4. Send one message to confirm connectivity
  5. Leave both sides completely idle
  6. Wait ~20 minutes
  7. Try sending a message from the phone → "Failed to send message — An unknown network error has occurred"
  8. The CLI still shows "Remote Control active"

Reproduction rate: 100% across 7 independent sessions tested (both relay agent sessions and normal interactive sessions).

Timeline observations:

Session Type Age at death
Interactive (auto-RC) remoteControlAtStartup ~21 min
Agent relay #1 --agent with /remote-control ~20 min
Agent relay #2 --agent with /remote-control ~20 min
Agent relay #3 --agent with SEND_KEEPALIVES=1 ~25 min
Agent relay #4 --agent with SEND_KEEPALIVES=1 ~30 min
Agent relay #5 --agent with /remote-control ~25 min
Agent relay #6 --agent with /remote-control ~25 min

Control test: 75 messages in 5 minutes to a fresh session — survived fine. Message volume does not cause the drop. It is purely time-based idle death.

Additional evidence: Curling the session_ingress endpoint directly returns HTTP 404 while the local CLI process is alive with TCP ESTABLISHED connections. The server deregisters the session before TCP teardown — the CLI never detects the loss.

Suggested Fix

Bug 1 (server-side): The server should count keep_alive messages (or WebSocket pings) as session activity for TTL purposes. Alternatively, enable heartbeat_interval_ms server-side (set to e.g. 30000) — the client-side heartbeat infrastructure already exists and runs unconditionally.

Bug 2 (client-side, one-line fix): Don't clear the keepalive interval when SEND_KEEPALIVES is set. In XD1(), skip clearing Dd when CLAUDE_CODE_REMOTE_SEND_KEEPALIVES is truthy:

function XD1() {
  C36--;
  if (C36 === 0) {
    if (!t6(process.env.CLAUDE_CODE_REMOTE_SEND_KEEPALIVES)) {
      if (Dd !== null) { clearInterval(Dd); Dd = null; }
    }
    kE9();
  }
}

This would make the 30-second keepalive run continuously during idle. Combined with Bug 1's fix (server counts keepalives as activity), idle sessions would survive indefinitely.

Current Workaround

The only effective mitigation is triggering periodic real model activity (e.g., sending a trivial message via terminal input every ~15 minutes). This resets the server-side session TTL because it generates actual messages through the bridge transport.

We also patched cli.js to restore the 5-minute keepalive (removing the CLAUDE_CODE_REMOTE guard from startKeepaliveInterval) — sessions still died at ~20 minutes, confirming the server-side TTL is the primary bug. Client-side keepalive fixes alone are insufficient.

Impact

This affects every Remote Control user. The core marketing promise — "start a task at your desk, then pick it up from your phone on the couch" — breaks the moment you set your phone down for 20 minutes.

Related Issues

These all describe symptoms of this root cause:

Supporting downstream evidence: Henderson11 on #28571 traced WebSocket close code 1002 (protocol error) occurring after the disconnect — this is the downstream effect of the session expiring server-side. Our analysis identifies the upstream cause: the server-side TTL that ignores keepalive traffic, compounded by the SEND_KEEPALIVES refcount bug that disables the only mechanism intended to address this.

Research Methodology

  • Static analysis of /opt/homebrew/lib/node_modules/@anthropic-ai/claude-code/cli.js (v2.1.72, 12MB minified)
  • Function-chain tracing through minified code: DD1XD1C36ie7I36keep_alive
  • Empirical testing: 7 sessions across 4 configurations (interactive, auto-RC, agent relay, SEND_KEEPALIVES=1)
  • Verified SEND_KEEPALIVES=1 doesn't fix idle death (refcount drops to 0 between turns)
  • Verified cli.js patch removing CLAUDE_CODE_REMOTE guard doesn't fix idle death (server ignores transport-level keepalives for TTL)
  • Verified on normal interactive session (remoteControlAtStartup) — not specific to agent/relay usage
  • Cross-referenced with 10+ existing GitHub issues (all symptom reports, no root cause analysis)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:networkingbugSomething isn't workinghas reproHas detailed reproduction stepsplatform:linuxIssue specifically occurs on Linuxplatform:macosIssue specifically occurs on macOS

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions