Skip to content

HTTP transport ignores retry_on_reload and lacks stale connection detection #795

@dsarno

Description

@dsarno

Summary

The HTTP/WebSocket transport path (PluginHub) does not respect retry_on_reload=False and lacks the stale connection detection that the stdio path now has. This means script mutations sent via HTTP clients (Claude Code, Cursor, etc.) can be retried during domain reload, causing duplicate execution — the same class of bug fixed for stdio in PR #792.

Root Cause

send_with_unity_instance() discards all kwargs (including retry_on_reload) when routing through the HTTP path. It extracts command_type and params, then calls PluginHub.send_command_for_instance() which has no knowledge of retry semantics.

# unity_transport.py — HTTP branch
command_type = args[0]
params = args[1] if len(args) > 1 else kwargs.get("params")
# ... retry_on_reload is silently dropped
raw = await PluginHub.send_command_for_instance(unity_instance, command_type, params)

Specific Issues

Issue Stdio (fixed) HTTP (broken)
Stale connection detection before send _ensure_live_connection() MSG_PEEK None — detected only via 30s timeout
retry_on_reload=False respected Yes No — silently dropped
Reconnect wait during mutations Skipped when retry_on_reload=False Always waits 20s for session reconnect
Duplicate mutation risk Low High
Connection-lost verification SHA comparison after reconnect Timeout → client may retry blindly

Suggested Fix

  1. Pass retry_on_reload through the HTTP pathPluginHub.send_command_for_instance() should accept and respect it. When False, skip _resolve_session_id() wait and don't retry on disconnect.

  2. Add pre-send connection validation — equivalent to stdio's _ensure_live_connection(), probe WebSocket liveness before sending mutations.

  3. Unify the transport retry contract — Both stdio and HTTP paths implement their own retry/reconnect/reload-wait logic independently, which means every fix has to be applied twice and debugged twice. Consider extracting a shared TransportPolicy or similar abstraction that both paths use:

    • Stale connection detection (pre-send health check)
    • Reload-aware retry semantics (retry_on_reload flag)
    • Post-mutation wait-for-ready
    • Connection-lost-after-send detection

    This doesn't need to be a deep abstraction — even a shared set of helper functions that both paths call at the right points would prevent the current situation where stdio gets a fix and HTTP silently remains broken.

Context

Discovered during PR #792 (fix script edit tools retrying non-idempotent commands during reload). The stdio path was fixed end-to-end; this issue tracks the equivalent work for HTTP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions