Skip to content

Conversation

@dsarno
Copy link
Collaborator

@dsarno dsarno commented Jan 30, 2026

Summary

  • Add server-side ping/pong heartbeat to detect dead WebSocket connections (fixes silent connection failures on Windows)
  • Fix connection recovery after Unity domain reloads by increasing timeouts from 2s to 20s
  • Skip PluginHub session resolution for stdio transport (was causing unnecessary delays)
  • Add ClassVar annotations and lock protection for thread safety
  • Fix test flakiness (timer variance, Windows paths, LogAssert.Expect)

Problem

Two related issues were causing connection reliability problems:

  1. Issue PropertyConversion failures crash command dispatcher while telemetry continues #654: After running test suites or other operations that trigger Unity domain reloads, the dispatcher would become unreachable ("No Unity Editor instances found"). The root cause was that domain reloads take 10-20s, but the reconnection timeout was only 2s.

  2. Issue beta branch: when Claude Code reconnects, Unity session becomes "no session" even though handshake succeeds. This breaks the workflow completely. Logs attached. But Sometimes it works,and the bug only occurs in beta by using http to connect #643: On Windows, WebSocket connections can die silently (OSError 64: "The specified network name is no longer available") without either side being notified. Commands would fail with "Unity session not available" until Unity eventually detected the dead connection.

Solution

Ping/Pong Heartbeat (fixes #643)

  • Server sends {"type": "ping"} every 10 seconds
  • Unity responds with {"type": "pong", "session_id": "..."}
  • If no pong within 20 seconds, server closes connection proactively
  • Dead connections now detected within 20s instead of waiting indefinitely

Domain Reload Recovery (fixes #654)

  • Increased UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S from 2s to 20s
  • Increased UNITY_MCP_RELOAD_MAX_WAIT_S from 2s to 20s
  • Added _is_http_transport() check so stdio transport skips PluginHub entirely

Code Quality

  • Added ClassVar type annotations for mutable class attributes
  • Added lock protection for _last_pong dict access
  • Changed debug stack trace log from Warn to Debug level
  • Removed unused timing variable

Test Fixes

  • Fixed flaky async duration test (allow 20% timer variance on Windows)
  • Fixed Python test that cleared HOME env var breaking Path.home() on Windows
  • Skip Unix-path test on Windows (path separator difference)
  • Added LogAssert.Expect to PropertyConversion tests for expected error logs

Test plan

  • Verify ping/pong heartbeat works (check server logs for [Ping] messages)
  • Run full test suite and verify dispatcher stays reachable afterward
  • Verify commands are fast when Unity is connected (no delays)
  • All Python tests pass (563)
  • All Unity EditMode tests pass (510 passed, 54 skipped)
  • All Unity PlayMode tests pass (5)

Fixes #654
Fixes #643

🤖 Generated with Claude Code

dsarno and others added 3 commits January 30, 2026 15:28
On Windows, WebSocket connections can die silently (OSError 64) without
either side being notified. This causes commands to fail with "Unity
session not available" until Unity eventually detects the dead connection.

Changes:
- Add PingMessage model for server->client pings
- Add ping loop in PluginHub that sends pings every 10 seconds
- Track last pong time per session; close connection if no pong within 20s
- Include session_id in pong messages from Unity for server-side tracking
- Clean up debug/timing logs from Issue CoplayDev#654 investigation

The server will now proactively detect dead connections within 20 seconds
instead of waiting indefinitely for the next command to fail.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Unity performs a domain reload (after script changes, test runs,
or large payload transfers), the MCP connection drops and needs to
reconnect. The previous reconnection timeout (2s) was too short for
domain reloads which can take 10-30s.

Changes:
- Increase UNITY_MCP_RELOAD_MAX_WAIT_S default from 2s to 30s
- Increase backoff cap when reloading from 0.8s to 5.0s
- Skip PluginHub session resolution for stdio transport (was causing
  unnecessary waits on every command)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…beat

# Conflicts:
#	Server/src/transport/plugin_hub.py
#	Server/src/transport/unity_transport.py
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 30, 2026

Reviewer's Guide

Implements a server-initiated WebSocket ping/pong heartbeat to proactively detect dead connections, extends Unity-domain-reload-related timeouts from 2s to 30s (with updated backoff and documentation), skips PluginHub session resolution when using stdio transport, and adjusts logging/timing for better observability and reliability diagnostics.

Sequence diagram for WebSocket ping_pong heartbeat and dead connection detection

sequenceDiagram
    participant PluginHub as PluginHub_Server
    participant UnityClient as UnityWebSocketTransportClient

    rect rgb(230,230,250)
        Note over PluginHub,UnityClient: Registration establishes WebSocket session
        PluginHub->>UnityClient: Register handshake
        UnityClient-->>PluginHub: RegisteredMessage(session_id)
        PluginHub->>PluginHub: _connections[session_id] = websocket
        PluginHub->>PluginHub: _last_pong[session_id] = time.monotonic()
        PluginHub->>PluginHub: start _ping_loop(session_id, websocket)
    end

    loop Every PING_INTERVAL (10s)
        PluginHub->>PluginHub: sleep(PING_INTERVAL)
        PluginHub->>PluginHub: check session_id in _connections
        alt Session no longer connected
            PluginHub->>PluginHub: stop _ping_loop for session_id
        else Session still connected
            PluginHub->>PluginHub: elapsed = now - _last_pong[session_id]
            alt elapsed > PING_TIMEOUT (20s)
                PluginHub->>PluginHub: log [Ping] stale session
                PluginHub->>UnityClient: close(code=1001)
                PluginHub->>PluginHub: stop _ping_loop for session_id
            else Send ping
                PluginHub->>UnityClient: send_json(PingMessage{type: ping})
                alt Send succeeds
                    UnityClient-->>PluginHub: PongMessage{type: pong, session_id}
                    PluginHub->>PluginHub: _handle_pong(payload)
                    PluginHub->>PluginHub: registry.touch(session_id)
                    PluginHub->>PluginHub: _last_pong[session_id] = time.monotonic()
                else Send fails (e.g. OSError 64)
                    PluginHub->>PluginHub: log [Ping] send failure
                    PluginHub->>UnityClient: close(code=1006)
                    PluginHub->>PluginHub: stop _ping_loop for session_id
                end
            end
        end
    end

    Note over PluginHub,UnityClient: on_disconnect cancels ping task and clears _last_pong
Loading

Sequence diagram for HTTP vs stdio transport session resolution with PluginHub

sequenceDiagram
    actor MCPClient
    participant Server as UnityTransport_Server
    participant Middleware as UnityInstanceMiddleware
    participant PluginHub as PluginHub
    participant Stdio as LegacyUnityConnection

    MCPClient->>Server: Invoke command
    Server->>Middleware: _inject_unity_instance(context)

    alt HTTP transport
        Middleware->>Server: _is_http_transport() == true
        Middleware->>PluginHub: PluginHub.is_configured()
        alt PluginHub configured
            Middleware->>PluginHub: _resolve_session_id(unity_instance, user_id)
            PluginHub-->>Middleware: session_id (wait up to UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S <= 30s)
            Middleware->>Server: attach session_id to context
            Server->>PluginHub: route HTTP request via WebSocket session
            PluginHub-->>Server: MCPResponse
        else PluginHub not configured
            Middleware->>Server: fall back to active_instance only
            Server->>Stdio: send_command_with_retry(instance_id, command)
            Stdio-->>Server: Unity response
        end
    else Stdio transport
        Middleware->>Server: _is_http_transport() == false
        Note over Middleware,PluginHub: PluginHub session resolution is skipped entirely
        Middleware->>Server: use active_instance or instance_id only
        Server->>Stdio: send_command_with_retry(instance_id, command)
        Stdio-->>Server: Unity response
    end

    Server-->>MCPClient: MCPResponse
Loading

Class diagram for PluginHub heartbeat and related transport models

classDiagram
    class PluginHub {
        +KEEP_ALIVE_INTERVAL: int
        +SERVER_TIMEOUT: int
        +COMMAND_TIMEOUT: int
        +PING_INTERVAL: int
        +PING_TIMEOUT: int
        +FAST_FAIL_TIMEOUT: float
        +_connections: dict~str, WebSocket~
        +_pending: dict~str, dict~str, Any~~
        +_lock: asyncio.Lock
        +_loop: asyncio.AbstractEventLoop
        +_last_pong: dict~str, float~
        +_ping_tasks: dict~str, asyncio.Task~
        +on_disconnect(websocket: WebSocket, close_code: int) void
        +_handle_register(websocket: WebSocket, payload: RegisterMessage) Task
        +_handle_pong(payload: PongMessage) Task
        +_ping_loop(session_id: str, websocket: WebSocket) Task
        +_get_connection(session_id: str) Task~WebSocket~
        +_resolve_session_id(unity_instance: str, user_id: str, allow_unattached: bool) Task~str~
    }

    class PingMessage {
        +type: str
    }

    class PongMessage {
        +type: str
        +session_id: str
    }

    class WebSocketTransportClient {
        -_sessionId: string
        +SendPongAsync(token: CancellationToken) Task
        +SendJsonAsync(payload: JObject, token: CancellationToken) Task
        +HandleSocketClosureAsync(reason: string) Task
    }

    class TransportCommandDispatcher {
    }

    class PendingCommand {
        +PendingCommand(commandJson: string, completionSource: TaskCompletionSource, cancellationToken: CancellationToken, registration: CancellationTokenRegistration)
        +CommandJson: string
        +CompletionSource: TaskCompletionSource
        +CancellationToken: CancellationToken
        +CancellationRegistration: CancellationTokenRegistration
        +IsExecuting: bool
        +QueuedAt: DateTime
        +Dispose() void
    }

    class LegacyUnityConnection {
        +send_command(command_type: str, params: dict~str, Any~, max_attempts: int) dict
    }

    class UnityTransportModule {
        +_is_http_transport() bool
    }

    PluginHub --> PingMessage : uses
    PluginHub --> PongMessage : handles
    WebSocketTransportClient --> PongMessage : sends
    WebSocketTransportClient --> PingMessage : receives
    TransportCommandDispatcher *-- PendingCommand : contains
    LegacyUnityConnection <-- UnityTransportModule : used_by
    PluginHub <-- UnityTransportModule : used_by
Loading

Flow diagram for send_command_with_retry and reload timeout handling

flowchart TD
    A[Start send_command_with_retry] --> B[Get LegacyUnityConnection via get_unity_connection]
    B --> C[Read reload_max_retries and reload_retry_ms from config]
    C --> D[Read UNITY_MCP_RELOAD_MAX_WAIT_S env or default 30.0]
    D --> E[Clamp max_wait_s to 0..30]
    E --> F[Initialize waited = 0, attempt = 0]

    F --> G{waited < max_wait_s and
attempt <= max_retries}
    G -->|No| H[Log reload timed out
or retries exhausted]
    H --> I[Return last response or error]

    G -->|Yes| J[Attempt send_command
via LegacyUnityConnection]
    J --> K{Success?}

    K -->|Yes| L[Return successful response]

    K -->|No| M[Inspect status file /
fast_error flags]
    M --> N{status.reloading?}

    N -->|Yes| O[Set backoff cap = 5.0s
domain reload can take 10-30s]
    N -->|No and fast_error| P[Set backoff cap = 0.25s]
    N -->|No and not fast_error| Q[Set backoff cap = 1.0s]

    O --> R[Compute next backoff
and sleep]
    P --> R
    Q --> R

    R --> S[Increment waited and attempt]
    S --> G
Loading

File-Level Changes

Change Details Files
Add server-side WebSocket ping/pong heartbeat using per-session ping loops and last-pong tracking to close stale connections proactively.
  • Introduce PingMessage model and import it into the WebSocket plugin hub transport.
  • Track last pong timestamps and per-session ping tasks keyed by session_id in the plugin hub.
  • Start a ping loop upon plugin registration that periodically sends ping messages, checks for pong staleness against PING_TIMEOUT, and closes the connection on failure or inactivity.
  • Update pong handling to refresh last-pong timestamps and ensure proper cleanup of ping tasks and tracking data on disconnect, including cancelling pending commands with debug logging.
Server/src/transport/models.py
Server/src/transport/plugin_hub.py
MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs
Relax and document timeouts/backoff to handle long Unity domain reloads and improve retry behavior.
  • Increase UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S default from 2.0 to 30.0 seconds with clamping, updated comments, and warning messages to match the new default when misconfigured.
  • Increase UNITY_MCP_RELOAD_MAX_WAIT_S default from 2.0 to 30.0 seconds with clamping and updated warning messages, and add documentation comments describing domain reload behavior and future TODOs for deterministic detection.
  • Adjust reload backoff cap to 5.0 seconds when Unity reports reloading, so retries span typical 10–30 second reload windows.
Server/src/transport/plugin_hub.py
Server/src/transport/legacy/unity_connection.py
Skip PluginHub session resolution for stdio transport and normalize server logging configuration.
  • Add and use an _is_http_transport helper so unity_instance_middleware only resolves session IDs via PluginHub when using HTTP/WebSocket transport, avoiding unnecessary delays on stdio.
  • Switch some logger initializations to use name for more standard module-scoped logging configuration.
Server/src/transport/unity_instance_middleware.py
Server/src/transport/unity_transport.py
Server/src/transport/plugin_hub.py
Add timing/debug instrumentation around stdio command flows and WebSocket closure handling for diagnostics.
  • Instrument stdio send_command and send_command_with_retry with timing logs for connect, send, receive, and total retry duration to aid performance analysis.
  • Log when send_command cancels or retries during domain reload scenarios using the new timing information.
  • Add a stack-trace-emitting warning log in WebSocketTransportClient.HandleSocketClosureAsync to help identify code paths that trigger disconnections.
  • Extend PendingCommand to track when commands are queued to support future timing/diagnostic work.
Server/src/transport/legacy/unity_connection.py
MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs
MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs

Assessment against linked issues

Issue Objective Addressed Explanation
#643 Ensure HTTP/WebSocket transport detects and cleans up dead or stale Unity plugin connections (e.g., Windows OSError 64) so that when Claude Code reconnects, the Unity session does not end up in a "no session" state despite a successful handshake.
#643 Improve reconnection and session recovery behavior around Unity domain reloads and transport selection so that the MCP server no longer reports "Server no longer running; ending orphaned session" in normal reconnect workflows on the beta HTTP path.
#654 Add robust error handling around PropertyConversion so incompatible property values (e.g., wrong types) do not crash the command dispatcher or corrupt the instance registry, and subsequent tool calls can continue to function. The PR only modifies transport-level behavior (WebSocket ping/pong heartbeat, domain reload timeouts, stdio connection handling, and logging). It does not touch PropertyConversion, manage_components, or any code related to converting property values, so dispatcher crashes from conversion exceptions are not addressed.
#654 Extend PropertyConversion support for common Unity types, including array-based formats for Vector2 (and similar structs) and integer instance IDs for UnityEngine.Object references, so manage_components set_property calls accept these formats. No changes are made to the conversion logic for Vector2, UnityEngine.Object, or other Unity structs. The PR does not implement array parsing for Vector2 or instance ID handling for UnityEngine.Object, and it does not modify manage_components behavior.
#654 Improve telemetry/status reporting so that telemetry reflects the health of the command dispatcher (e.g., indicating dispatcher errors or unhealthy state instead of always reporting success). The PR does not modify telemetry models or status reporting. It focuses on connection reliability (ping/pong, timeouts, transport middleware) and adds some diagnostic logging, but does not introduce any new telemetry fields or change how dispatcher health is reported.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Adds command enqueue timestamps and WebSocket pong/session payloads; logs socket closure stack traces; implements a server-initiated ping/pong heartbeat with per-session liveness tracking; broad timing instrumentation and increased Unity reload wait defaults to 20s; restricts PluginHub session resolution to HTTP transport; updates logger init and adds multiple Unity editor tests for property-conversion error handling.

Changes

Cohort / File(s) Summary
Command Dispatcher
MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs
Adds QueuedAt DateTime field/property to PendingCommand to record when commands are enqueued.
WebSocket Client
MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs
Pong payload now includes session_id; socket-closure handler logs stack trace for debugging.
Server: Heartbeat Models & Hub
Server/src/transport/models.py, Server/src/transport/plugin_hub.py
Adds PingMessage model and server-side ping loop with per-session _last_pong tracking, _ping_tasks, PING_INTERVAL (10s) and PING_TIMEOUT (20s), automatic close on liveness timeout, and ping lifecycle management on register/disconnect.
Server: Connection Timing & Reload Handling
Server/src/transport/legacy/unity_connection.py
Adds detailed timing instrumentation (connect/send/receive, per-attempt timings), increases read backoff cap, and raises default Unity reload max-wait handling to 20s (env override supported, clamped).
Server: Transport Session Validation
Server/src/transport/unity_instance_middleware.py
Guards PluginHub-based session validation to run only for HTTP transport; stdio transport skips PluginHub calls.
Server: Logger Init
Server/src/transport/unity_transport.py
Replaces static logger name with logging.getLogger(__name__).
Unity Editor Tests
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/*.cs, .../*.cs.meta, TestProjects/UnityMCPTests/Assets/Tests/EditMode/Services/Server/ServerCommandBuilderTests.cs
Adds multiple tests for property-conversion error handling and an array-for-float test; adds Guard in a path-specific test to skip on Windows; includes Unity .meta files for new tests.

Sequence Diagram

sequenceDiagram
    actor Client as Unity Client
    participant WS as WebSocketTransportClient
    participant Hub as PluginHub (Server)
    participant Server as Python Server

    Client->>WS: Connect & register (session_id)
    WS->>Hub: Register session
    Hub->>Hub: set _last_pong[session] = now<br/>spawn _ping_loop(session)

    loop every PING_INTERVAL (10s)
        Hub->>WS: PingMessage
        WS->>Client: deliver ping
        Client->>WS: PongMessage (includes session_id)
        WS->>Hub: forward pong
        Hub->>Hub: update _last_pong[session]
    end

    alt no pong within PING_TIMEOUT (20s)
        Hub->>Hub: detect stale session
        Hub->>WS: close connection / unregister session
    end

    Client->>Server: Command requests (normal flow)
    Server->>Client: Responses
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #643 — reconnect/session becomes "no session": changes add HTTP-only session resolution and a server-side ping/pong that could address session liveness and reconnection detection.
  • #654 — PropertyConversion failures crash dispatcher: this PR adds extensive property-conversion tests covering error cases, which relate to diagnosing and preventing dispatcher crashes (tests added but core conversion fixes not necessarily applied).

Possibly related PRs

  • #440 — Related material-management feature set and tooling changes; shares surface with added editor tooling/tests.
  • #510 — Modifies unity_connection.py and retry/reload wait logic; closely aligned with extended timing and reload-wait changes in this PR.

Suggested labels

codex

Suggested reviewers

  • msanatan

Poem

🐰
I queued your command at break of day,
I ping the server when connections stray,
I time the sends and watch the clock,
I close the gaps where pongs do not knock,
Hoppity logs — the network’s feeling gay! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix WebSocket connection reliability and domain reload recovery' accurately summarizes the main changes: ping/pong heartbeat for dead connection detection and increased timeouts for domain reload recovery.
Description check ✅ Passed The PR description includes all required template sections: summary, problem explanation, solution details, and test plan with linked issues.
Linked Issues check ✅ Passed Code changes comprehensively address both linked issues: #643 (WebSocket detection via ping/pong heartbeat) and #654 (increased timeouts and graceful error handling for domain reloads).
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objectives: ping/pong heartbeat, timeout increases, transport-specific logic, logging cleanup, and error handling tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new [TIMING-STDIO] and WebSocket stack-trace logs are emitted at info/Warn level and on hot paths; consider downgrading to debug or guarding them behind a config flag to avoid flooding logs in normal operation.
  • The ping heartbeat constants (PING_INTERVAL, PING_TIMEOUT) are currently hard-coded; consider making them configurable (similar to the UNITY_MCP_* timeouts) so they can be tuned without code changes for different environments.
  • Access to _last_pong in the ping loop and _handle_pong is done without the shared lock while other session state uses the lock; for consistency and to avoid subtle concurrency issues, it may be safer to read/write _last_pong under the same lock as _connections.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `[TIMING-STDIO]` and WebSocket stack-trace logs are emitted at `info`/`Warn` level and on hot paths; consider downgrading to `debug` or guarding them behind a config flag to avoid flooding logs in normal operation.
- The ping heartbeat constants (`PING_INTERVAL`, `PING_TIMEOUT`) are currently hard-coded; consider making them configurable (similar to the `UNITY_MCP_*` timeouts) so they can be tuned without code changes for different environments.
- Access to `_last_pong` in the ping loop and `_handle_pong` is done without the shared lock while other session state uses the lock; for consistency and to avoid subtle concurrency issues, it may be safer to read/write `_last_pong` under the same lock as `_connections`.

## Individual Comments

### Comment 1
<location> `Server/src/transport/plugin_hub.py:466-475` </location>
<code_context>
+    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider cleaning up `_ping_tasks`/`_last_pong` in `_ping_loop` to avoid relying solely on `on_disconnect`.

Relying on `on_disconnect` for cleanup means that in shutdown/exception edge cases (e.g. `_lock` set to `None`, `close()` failing, or `on_disconnect` not firing), the loop can exit the `while` and hit `finally` without removing this session from `_ping_tasks` or clearing `_last_pong`. Consider doing that cleanup in the `finally` block so the ping loop fully manages its own lifecycle and avoids stale entries.

Suggested implementation:

```python
            await registry.touch(session_id)
            # Record last pong time for staleness detection
            cls._last_pong[session_id] = time.monotonic()

    @classmethod
    def _cleanup_ping_session(cls, session_id: str) -> None:
        """Cleanup ping tracking state for a session.

        This is used by the ping loop's finally block to fully manage its own
        lifecycle, and can also be safely called from on_disconnect as needed.
        """
        # Remove this session's ping task and staleness tracking, if present.
        # Using pop(..., None) makes this idempotent and safe to call multiple times.
        cls._ping_tasks.pop(session_id, None)
        cls._last_pong.pop(session_id, None)

    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:

```

To fully implement your suggestion, the following additional changes are needed in the same file:

1. Wrap the body of `_ping_loop` in a `try`/`finally` and call the new helper in the `finally` block, so the ping loop always cleans up its own state:

```python
    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
        """Server-initiated ping loop to detect dead connections.

        Sends periodic pings to the Unity client. If no pong is received within
        PING_TIMEOUT seconds, the connection is considered dead and closed.
        This helps detect connections that die silently (e.g., Windows OSError 64).
        """
        logger.debug(f"[Ping] Starting ping loop for session {session_id}")
        try:
            while True:
                await asyncio.sleep(cls.PING_INTERVAL)
                # ... existing ping / timeout logic remains unchanged ...
        finally:
            # Ensure we don't leave stale entries if on_disconnect never fires
            cls._cleanup_ping_session(session_id)
```

2. Anywhere else you’re cleaning up `_ping_tasks` / `_last_pong` (likely in `on_disconnect`), update that code to call `cls._cleanup_ping_session(session_id)` instead of manipulating `_ping_tasks` / `_last_pong` directly. Because `_cleanup_ping_session` uses `pop(..., None)`, calling it from both `on_disconnect` and `_ping_loop` is safe and idempotent.

These changes ensure that the ping loop fully manages its own lifecycle and avoids stale entries even in shutdown/exception edge cases where `on_disconnect` might not execute.
</issue_to_address>

### Comment 2
<location> `MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs:654-660` </location>
<code_context>

         private async Task HandleSocketClosureAsync(string reason)
         {
+            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue #654 investigation)
+            var stackTrace = new System.Diagnostics.StackTrace(true);
+            McpLog.Warn($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
+
             if (_lifecycleCts == null || _lifecycleCts.IsCancellationRequested)
</code_context>

<issue_to_address>
**suggestion:** The WARN-level stack trace log on every socket closure may be too noisy for normal usage.

Since disconnects are often expected, consider either lowering this log to `Info`/`Debug`, gating it behind a diagnostic flag or `#if DEBUG`, or restricting the stack trace to unexpected closure reasons so you keep the investigation signal without overwhelming normal logs.

```suggestion
        private async Task HandleSocketClosureAsync(string reason)
        {
#if DEBUG
            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue #654 investigation)
            var stackTrace = new System.Diagnostics.StackTrace(true);
            McpLog.Debug($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
#endif

            if (_lifecycleCts == null || _lifecycleCts.IsCancellationRequested)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +466 to +478
async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
"""Server-initiated ping loop to detect dead connections.

Sends periodic pings to the Unity client. If no pong is received within
PING_TIMEOUT seconds, the connection is considered dead and closed.
This helps detect connections that die silently (e.g., Windows OSError 64).
"""
logger.debug(f"[Ping] Starting ping loop for session {session_id}")
try:
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider cleaning up _ping_tasks/_last_pong in _ping_loop to avoid relying solely on on_disconnect.

Relying on on_disconnect for cleanup means that in shutdown/exception edge cases (e.g. _lock set to None, close() failing, or on_disconnect not firing), the loop can exit the while and hit finally without removing this session from _ping_tasks or clearing _last_pong. Consider doing that cleanup in the finally block so the ping loop fully manages its own lifecycle and avoids stale entries.

Suggested implementation:

            await registry.touch(session_id)
            # Record last pong time for staleness detection
            cls._last_pong[session_id] = time.monotonic()

    @classmethod
    def _cleanup_ping_session(cls, session_id: str) -> None:
        """Cleanup ping tracking state for a session.

        This is used by the ping loop's finally block to fully manage its own
        lifecycle, and can also be safely called from on_disconnect as needed.
        """
        # Remove this session's ping task and staleness tracking, if present.
        # Using pop(..., None) makes this idempotent and safe to call multiple times.
        cls._ping_tasks.pop(session_id, None)
        cls._last_pong.pop(session_id, None)

    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:

To fully implement your suggestion, the following additional changes are needed in the same file:

  1. Wrap the body of _ping_loop in a try/finally and call the new helper in the finally block, so the ping loop always cleans up its own state:
    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
        """Server-initiated ping loop to detect dead connections.

        Sends periodic pings to the Unity client. If no pong is received within
        PING_TIMEOUT seconds, the connection is considered dead and closed.
        This helps detect connections that die silently (e.g., Windows OSError 64).
        """
        logger.debug(f"[Ping] Starting ping loop for session {session_id}")
        try:
            while True:
                await asyncio.sleep(cls.PING_INTERVAL)
                # ... existing ping / timeout logic remains unchanged ...
        finally:
            # Ensure we don't leave stale entries if on_disconnect never fires
            cls._cleanup_ping_session(session_id)
  1. Anywhere else you’re cleaning up _ping_tasks / _last_pong (likely in on_disconnect), update that code to call cls._cleanup_ping_session(session_id) instead of manipulating _ping_tasks / _last_pong directly. Because _cleanup_ping_session uses pop(..., None), calling it from both on_disconnect and _ping_loop is safe and idempotent.

These changes ensure that the ping loop fully manages its own lifecycle and avoids stale entries even in shutdown/exception edge cases where on_disconnect might not execute.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@Server/src/transport/legacy/unity_connection.py`:
- Around line 244-245: Remove the unused variable t0 in the send_command
function: delete the assignment "t0 = time.time()" (or replace it with a
meaningful use if you intended to compute duration) and keep the
logger.info("[TIMING-STDIO] send_command START: command=%s", command_type) line
unchanged; locate this inside the send_command method in unity_connection.py to
make the edit.
🧹 Nitpick comments (2)
MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs (1)

656-658: Debug logging uses warning level unconditionally.

The comment indicates this is for "DEBUG" / "Issue #654 investigation", but McpLog.Warn logs unconditionally at warning level. This could be noisy in production. Consider using McpLog.Debug which respects the debug logging preference, or remove this investigation code before merging.

🔧 Proposed fix: Use conditional debug logging
-            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue `#654` investigation)
-            var stackTrace = new System.Diagnostics.StackTrace(true);
-            McpLog.Warn($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
+            // Capture stack trace for debugging disconnection triggers
+            var stackTrace = new System.Diagnostics.StackTrace(true);
+            McpLog.Debug($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
Server/src/transport/plugin_hub.py (1)

86-89: Add ClassVar type annotation for mutable class attributes.

Static analysis correctly identifies that mutable class attributes should be annotated with typing.ClassVar to clarify they are shared across instances and prevent accidental shadowing.

🔧 Proposed fix
+from typing import Any, ClassVar
+
 class PluginHub(WebSocketEndpoint):
     ...
-    # session_id -> last pong timestamp (monotonic)
-    _last_pong: dict[str, float] = {}
-    # session_id -> ping task
-    _ping_tasks: dict[str, asyncio.Task] = {}
+    # session_id -> last pong timestamp (monotonic)
+    _last_pong: ClassVar[dict[str, float]] = {}
+    # session_id -> ping task
+    _ping_tasks: ClassVar[dict[str, asyncio.Task]] = {}

Comment on lines 244 to 245
t0 = time.time()
logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable t0.

The variable t0 is assigned but never used. The timing is logged but the stored value isn't referenced for duration calculation later.

🔧 Proposed fix
-        t0 = time.time()
         logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
t0 = time.time()
logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)
logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)
🧰 Tools
🪛 Ruff (0.14.14)

[error] 244-244: Local variable t0 is assigned to but never used

Remove assignment to unused variable t0

(F841)

🤖 Prompt for AI Agents
In `@Server/src/transport/legacy/unity_connection.py` around lines 244 - 245,
Remove the unused variable t0 in the send_command function: delete the
assignment "t0 = time.time()" (or replace it with a meaningful use if you
intended to compute duration) and keep the logger.info("[TIMING-STDIO]
send_command START: command=%s", command_type) line unchanged; locate this
inside the send_command method in unity_connection.py to make the edit.

- Add server-side ping loop to detect dead WebSocket connections
- Include session_id in pong messages for tracking
- Reduce domain reload timeout from 30s to 20s
- Add ClassVar annotations for mutable class attributes
- Add lock protection for _last_pong access
- Change debug stack trace log from Warn to Debug level
- Remove unused TIMING-STDIO variable
- Fix flaky async duration test (allow 20% timer variance)
- Fix Python test that cleared HOME env var on Windows
- Skip Unix-path test on Windows (path separator difference)
- Add LogAssert.Expect to PropertyConversion tests

Fixes CoplayDev#654, CoplayDev#643

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dsarno dsarno force-pushed the fix/ping-pong-heartbeat branch from 2bfffe7 to 0c4c5cc Compare January 31, 2026 00:38
@dsarno dsarno merged commit 4d96941 into CoplayDev:beta Jan 31, 2026
1 of 2 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@Server/src/transport/legacy/unity_connection.py`:
- Around line 780-800: Update the reload wait default and clamp to 30s: change
the environment default for UNITY_MCP_RELOAD_MAX_WAIT_S from "20.0" to "30.0",
adjust any hard-coded fallback max_wait_s values (currently set to 20.0) to
30.0, and modify the clamp statement that enforces the range to use 30.0 as the
upper bound (max_wait_s = max(0.0, min(max_wait_s, 30.0))). Also update the
logger.warning message text (and any nearby comments referencing 20s) to reflect
the new 30s default so messages and docs match the new cap; ensure you reference
the same variable name max_wait_s and the env var UNITY_MCP_RELOAD_MAX_WAIT_S in
your changes.

In `@Server/src/transport/plugin_hub.py`:
- Around line 492-519: In the exception handler for failed ping sends (the block
that creates PingMessage, calls websocket.send_json and handles send_ex),
replace the prohibited close code 1006 with 1001 when calling websocket.close
and ensure any exception thrown by websocket.close is logged instead of
swallowed; update the logger call to include context (session_id and the close
exception) and reference the same symbols (PingMessage, websocket.close,
cls.PING_TIMEOUT, logger) so the error path logs close failures for debugging.

In
`@TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs`:
- Around line 207-235: ManageEditor.HandleCommand currently lacks a case for the
"telemetry_status" action so unknown actions hit the default ErrorResponse;
update ManageEditor.HandleCommand (ManageEditor.cs) to handle "telemetry_status"
by either implementing a new branch that gathers and returns the telemetry
health fields (dispatcher status, success flags, etc.) or routing the action to
the existing telemetry handler (if one exists), ensuring the switch or dispatch
logic recognizes "telemetry_status" and returns a success response object used
by the test.
🧹 Nitpick comments (4)
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Services/Server/ServerCommandBuilderTests.cs (1)

113-118: Use Assert.Ignore to mark the test as skipped on Windows instead of Assert.Pass.

Assert.Pass marks the test as passed in test reports, which masks that it never actually ran. Assert.Ignore correctly records it as skipped, keeping test results accurate.

Suggested change
             if (UnityEngine.Application.platform == UnityEngine.RuntimePlatform.WindowsEditor)
             {
-                Assert.Pass("Skipped on Windows - use BuildUvPathFromUvx_WindowsPath_ConvertsCorrectly instead");
-                return;
+                Assert.Ignore("Skipped on Windows - use BuildUvPathFromUvx_WindowsPath_ConvertsCorrectly instead");
             }
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversion_ArrayForFloat_Test.cs (1)

36-56: Assert error response and unchanged property.

Right now the test only checks for a non-null result, so it would still pass if the command incorrectly succeeds. Consider asserting the error response and that spatialBlend remains unchanged.

♻️ Suggested assertion tightening
 var result = ManageComponents.HandleCommand(setPropertyParams);
-Assert.IsNotNull(result, "Should return a result");
+Assert.IsNotNull(result, "Should return a result");
+if (result is ErrorResponse error)
+{
+    Assert.IsFalse(error.Success, "Should report failure for incompatible value");
+}
+else
+{
+    Assert.Fail($"Expected ErrorResponse but got {result.GetType().Name}");
+}
+Assert.AreEqual(0f, audioSource.spatialBlend, "spatialBlend should remain unchanged");
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs (2)

40-66: Enforce the expected ErrorResponse.

The test allows a success response to pass; that undermines the “graceful error” assertion. Consider requiring an ErrorResponse with Success == false.

♻️ Suggested assertion tightening
 var result = ManageComponents.HandleCommand(setPropertyParams);
-
-// Main test: should return a result without crashing
-Assert.IsNotNull(result, "Should return a result, not crash dispatcher");
-
-// If it's an ErrorResponse, verify it properly reports failure
-if (result is ErrorResponse errorResp)
-{
-    Assert.IsFalse(errorResp.Success, "Should report failure for incompatible type");
-}
+var errorResp = result as ErrorResponse;
+Assert.IsNotNull(errorResp, "Expected ErrorResponse for incompatible type");
+Assert.IsFalse(errorResp.Success, "Should report failure for incompatible type");

241-280: Test currently passes in all branches.

Assert.Pass is invoked in every path, so this test can’t fail even if behavior regresses. Consider asserting the intended contract once defined (e.g., no throw + null result, or a specific error path).

Comment on lines +780 to +800
# Default to 20s to handle domain reloads (which can take 10-20s after tests or script changes).
#
# NOTE: This wait can impact agentic workflows where domain reloads happen
# frequently (e.g., after test runs, script compilation). The 20s default
# balances handling slow reloads vs. avoiding unnecessary delays.
#
# TODO: Make this more deterministic by detecting Unity's actual reload state
# rather than blindly waiting up to 20s. See Issue #657.
#
# Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 20.0, max: 20.0)
try:
max_wait_s = float(os.environ.get(
"UNITY_MCP_RELOAD_MAX_WAIT_S", "2.0"))
"UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0"))
except ValueError as e:
raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "2.0")
raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0")
logger.warning(
"Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 2.0: %s",
"Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 20.0: %s",
raw_val, e)
max_wait_s = 2.0
# Clamp to [0, 30] to prevent misconfiguration from causing excessive waits
max_wait_s = max(0.0, min(max_wait_s, 30.0))
max_wait_s = 20.0
# Clamp to [0, 20] to prevent misconfiguration from causing excessive waits
max_wait_s = max(0.0, min(max_wait_s, 20.0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reload wait is still capped at 20s—objective calls for 30s.
Issue #654 notes reloads can take 10–30s, but the default, clamp, and log text still enforce 20s, which can prematurely abort retries. Please bump the default and cap (and update messaging) to 30s.

♻️ Proposed fix
-    # Default to 20s to handle domain reloads (which can take 10-20s after tests or script changes).
+    # Default to 30s to handle domain reloads (which can take 10-30s after tests or script changes).
@@
-    # Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 20.0, max: 20.0)
+    # Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 30.0, max: 30.0)
     try:
-        max_wait_s = float(os.environ.get(
-            "UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0"))
+        max_wait_s = float(os.environ.get(
+            "UNITY_MCP_RELOAD_MAX_WAIT_S", "30.0"))
     except ValueError as e:
-        raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0")
+        raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "30.0")
         logger.warning(
-            "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 20.0: %s",
+            "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 30.0: %s",
             raw_val, e)
-        max_wait_s = 20.0
-    # Clamp to [0, 20] to prevent misconfiguration from causing excessive waits
-    max_wait_s = max(0.0, min(max_wait_s, 20.0))
+        max_wait_s = 30.0
+    # Clamp to [0, 30] to prevent misconfiguration from causing excessive waits
+    max_wait_s = max(0.0, min(max_wait_s, 30.0))
🤖 Prompt for AI Agents
In `@Server/src/transport/legacy/unity_connection.py` around lines 780 - 800,
Update the reload wait default and clamp to 30s: change the environment default
for UNITY_MCP_RELOAD_MAX_WAIT_S from "20.0" to "30.0", adjust any hard-coded
fallback max_wait_s values (currently set to 20.0) to 30.0, and modify the clamp
statement that enforces the range to use 30.0 as the upper bound (max_wait_s =
max(0.0, min(max_wait_s, 30.0))). Also update the logger.warning message text
(and any nearby comments referencing 20s) to reflect the new 30s default so
messages and docs match the new cap; ensure you reference the same variable name
max_wait_s and the env var UNITY_MCP_RELOAD_MAX_WAIT_S in your changes.

Comment on lines +492 to +519
# Check staleness: has it been too long since we got a pong?
elapsed = time.monotonic() - last_pong
if elapsed > cls.PING_TIMEOUT:
logger.warning(
f"[Ping] Session {session_id} stale: no pong for {elapsed:.1f}s "
f"(timeout={cls.PING_TIMEOUT}s). Closing connection."
)
try:
await websocket.close(code=1001) # Going away
except Exception as close_ex:
logger.debug(f"[Ping] Error closing stale websocket: {close_ex}")
break

# Send a ping to the client
try:
ping_msg = PingMessage()
await websocket.send_json(ping_msg.model_dump())
logger.debug(f"[Ping] Sent ping to session {session_id}")
except Exception as send_ex:
# Send failed - connection is dead
logger.warning(
f"[Ping] Failed to send ping to session {session_id}: {send_ex}. "
"Connection likely dead."
)
try:
await websocket.close(code=1006) # Abnormal closure
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

RFC 6455 WebSocket close code 1006 must not be sent close frame specification

💡 Result:

In RFC 6455, WebSocket close code 1006 (“abnormal closure”) is reserved and must not be sent on the wire:

  • “1006 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint.” It is only for local/reporting use to indicate the connection closed abnormally (e.g., the TCP connection dropped) without sending or receiving a Close control frame. [1]
  • RFC 6455 also defines that if the connection is closed and no Close control frame was received, then the implementation considers the connection close code to be 1006 (again, internal/observed state—not a transmitted close frame). [2]

So: you will never legitimately see a peer send a Close frame with code 1006; implementations use it to report “abnormal closure” when the closing handshake didn’t happen. [1][2]

Sources:
[1] RFC 6455, Section 7.4.1 “Defined Status Codes” (1006) https://datatracker.ietf.org/doc/html/rfc6455
[2] RFC 6455, Section 7.1.5 “The WebSocket Connection Close Code” (1006 when no Close frame received) https://datatracker.ietf.org/doc/html/rfc6455


🏁 Script executed:

# Find and examine the file mentioned in the review
fd -t f "plugin_hub.py" | head -5

Repository: CoplayDev/unity-mcp

Length of output: 97


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '492,519p' Server/src/transport/plugin_hub.py

Repository: CoplayDev/unity-mcp

Length of output: 1455


🏁 Script executed:

# Let's see more context around these lines to understand the full function
sed -n '470,530p' Server/src/transport/plugin_hub.py

Repository: CoplayDev/unity-mcp

Length of output: 2947


Replace close code 1006 with 1001 and log close failures.

Line 517 uses close code 1006, which RFC 6455 explicitly reserves and forbids sending on the wire—it must only be used internally to report abnormal closure when no Close frame was exchanged. Sending it violates the specification and can cause protocol violations. Additionally, the exception handler at lines 518–519 silently swallows errors without logging, hiding diagnostics crucial for debugging connection failures.

🔧 Proposed fix
                        await websocket.close(code=1006)  # Abnormal closure
-                    except Exception:
-                        pass
+                        await websocket.close(code=1001)  # Going away
+                    except Exception as close_ex:
+                        logger.debug(f"[Ping] Error closing websocket after send failure: {close_ex}")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 501-501: Do not catch blind exception: Exception

(BLE001)


[warning] 510-510: Do not catch blind exception: Exception

(BLE001)


[error] 518-519: try-except-pass detected, consider logging the exception

(S110)


[warning] 518-518: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@Server/src/transport/plugin_hub.py` around lines 492 - 519, In the exception
handler for failed ping sends (the block that creates PingMessage, calls
websocket.send_json and handles send_ex), replace the prohibited close code 1006
with 1001 when calling websocket.close and ensure any exception thrown by
websocket.close is logged instead of swallowed; update the logger call to
include context (session_id and the close exception) and reference the same
symbols (PingMessage, websocket.close, cls.PING_TIMEOUT, logger) so the error
path logs close failures for debugging.

Comment on lines +207 to +235
[Test]
public void ManageEditor_TelemetryStatus_ReportsAccurateHealth()
{
// Trigger multiple conversion failures first
var audioSource = testGameObject.AddComponent<AudioSource>();

for (int i = 0; i < 3; i++)
{
var badParam = new JObject
{
["action"] = "set_property",
["target"] = testGameObject.name,
["componentType"] = "AudioSource",
["property"] = "clip",
["value"] = i * 1000 // bad
};
ManageComponents.HandleCommand(badParam);
}

// Now check telemetry
var telemetryParams = new JObject { ["action"] = "telemetry_status" };
var telemetryResult = ManageEditor.HandleCommand(telemetryParams);

Assert.IsNotNull(telemetryResult, "Telemetry should return result");

// NOTE: Issue #654 noted that telemetry returns success even when dispatcher is dead.
// If telemetry returns success, that's the actual current behavior (which may be a problem).
// This test just documents what happens.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n --type=cs -C3 'telemetry_status'

Repository: CoplayDev/unity-mcp

Length of output: 8475


🏁 Script executed:

rg -n --type=cs 'telemetry_status' -B5 -A10 | grep -E '(case|HandleCommand|dispatcher|health|success)' | head -50

Repository: CoplayDev/unity-mcp

Length of output: 4682


🏁 Script executed:

# Find the ManageEditor implementation
fd -e cs -type f | xargs rg -l 'ManageEditor' | head -10

Repository: CoplayDev/unity-mcp

Length of output: 490


🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/ManageEditor.cs | head -100

Repository: CoplayDev/unity-mcp

Length of output: 4734


🏁 Script executed:

# Search for telemetry_status handler specifically
rg -n 'telemetry_status|case "telemetry' MCPForUnity/Editor/Tools/ManageEditor.cs -A 20

Repository: CoplayDev/unity-mcp

Length of output: 45


🏁 Script executed:

# Search for telemetry in the entire codebase
rg -n 'case.*telemetry' --type=cs -A 15

Repository: CoplayDev/unity-mcp

Length of output: 45


🏁 Script executed:

# Get line count to see full file
wc -l MCPForUnity/Editor/Tools/ManageEditor.cs

Repository: CoplayDev/unity-mcp

Length of output: 107


🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/ManageEditor.cs | tail -300

Repository: CoplayDev/unity-mcp

Length of output: 15288


🏁 Script executed:

# Search for where telemetry_status is handled - might be in a different file
rg -rn 'telemetry_status' --type=cs -B 5 -A 15 | grep -E '(case|return|new|telemetry_status|dispatcher|health)' | head -60

Repository: CoplayDev/unity-mcp

Length of output: 4442


🏁 Script executed:

# Search for dispatcher or health related code
rg -rn 'dispatcher' --type=cs | head -20

Repository: CoplayDev/unity-mcp

Length of output: 2117


🏁 Script executed:

cat -n TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs | sed -n '207,235p'

Repository: CoplayDev/unity-mcp

Length of output: 1502


🏁 Script executed:

# Search for where telemetry might be handled as a resource
fd -type f -name '*.cs' | xargs grep -l 'telemetry' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 233


🏁 Script executed:

# Search for telemetry in all files
find . -name '*.cs' -type f | xargs grep -l telemetry

Repository: CoplayDev/unity-mcp

Length of output: 9188


🏁 Script executed:

# Look for resource implementations
find . -name '*Resource*.cs' -type f | head -10

Repository: CoplayDev/unity-mcp

Length of output: 184


🏁 Script executed:

cat -n MCPForUnity/Editor/Helpers/TelemetryHelper.cs

Repository: CoplayDev/unity-mcp

Length of output: 9323


🏁 Script executed:

# Look for how telemetry_status action is actually handled
find . -name '*.cs' -type f -exec grep -l 'telemetry_status.*action\|HandleCommand.*telemetry' {} \; | head -20

Repository: CoplayDev/unity-mcp

Length of output: 265


🏁 Script executed:

# Search for response classes that might include dispatcher health
find . -name '*Response*.cs' -type f | head -20

Repository: CoplayDev/unity-mcp

Length of output: 1298


🏁 Script executed:

# Look for a McpServer or handler that might route telemetry_status
find . -name '*Server*.cs' -o -name '*Handler*.cs' -o -name '*Dispatcher*.cs' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 922


🏁 Script executed:

cat -n MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs

Repository: CoplayDev/unity-mcp

Length of output: 17315


🏁 Script executed:

# Find CommandRegistry
find . -name 'CommandRegistry.cs' -type f

Repository: CoplayDev/unity-mcp

Length of output: 108


🏁 Script executed:

# Search for telemetry_status registration
grep -rn 'telemetry_status' --include='*.cs' | grep -v Test | grep -v '.Trim()'

Repository: CoplayDev/unity-mcp

Length of output: 45


🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/CommandRegistry.cs

Repository: CoplayDev/unity-mcp

Length of output: 18443


🏁 Script executed:

# Find telemetry_status as a resource or tool
find . -name '*.cs' -type f -exec grep -l 'telemetry_status\|TelemetryStatus' {} \; | grep -v Test | grep -v '.Trim()'

Repository: CoplayDev/unity-mcp

Length of output: 45


🏁 Script executed:

# Search for McpForUnityResource or McpForUnityTool with telemetry context
grep -rn 'McpForUnityResource\|McpForUnityTool' --include='*.cs' | grep -v Test | head -20

Repository: CoplayDev/unity-mcp

Length of output: 2277


🏁 Script executed:

# Search for all HandleCommand methods across the codebase
grep -rn 'public static.*HandleCommand' --include='*.cs' | grep -v Test

Repository: CoplayDev/unity-mcp

Length of output: 3557


🏁 Script executed:

# Look for files that might contain telemetry action
find . -path ./TestProjects -prune -o -name '*.cs' -type f -print | xargs grep -l 'action.*telemetry\|telemetry.*action' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 45


🏁 Script executed:

# Read the full test to see what it actually checks
cat -n TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs | sed -n '200,250p'

Repository: CoplayDev/unity-mcp

Length of output: 2599


🏁 Script executed:

# Let's also check if ManageEditor has a default case that might handle telemetry
grep -n 'default:' MCPForUnity/Editor/Tools/ManageEditor.cs -A 10

Repository: CoplayDev/unity-mcp

Length of output: 616


Test will fail with unknown action error before reaching health assertions.

The test calls ManageEditor.HandleCommand with action telemetry_status, but this action is not implemented in ManageEditor.cs (lines 50-141). The switch statement only handles: play, pause, stop, set_active_tool, add_tag, remove_tag, add_layer, remove_layer. Any unknown action hits the default case and returns an ErrorResponse, so the test will fail immediately rather than proceed to check telemetry response content.

Either:

  1. Implement telemetry_status handling in ManageEditor, or
  2. Route telemetry_status to the correct handler if it exists elsewhere.

Once the action is properly implemented, the suggestion to assert on health fields (dispatcher status, success flags) aligns with issue #654's intent and should be added.

🤖 Prompt for AI Agents
In
`@TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs`
around lines 207 - 235, ManageEditor.HandleCommand currently lacks a case for
the "telemetry_status" action so unknown actions hit the default ErrorResponse;
update ManageEditor.HandleCommand (ManageEditor.cs) to handle "telemetry_status"
by either implementing a new branch that gathers and returns the telemetry
health fields (dispatcher status, success flags, etc.) or routing the action to
the existing telemetry handler (if one exists), ensuring the switch or dispatch
logic recognizes "telemetry_status" and returns a success response object used
by the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant