-
Notifications
You must be signed in to change notification settings - Fork 0
Bound Unity reload/session waits and refine retry handling #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… stall prevention - Add TestRunnerNoThrottle.cs: Sets editor to 'No Throttling' mode during test runs with SessionState persistence across domain reload - Add run_tests_async and get_test_job tools for non-blocking test execution - Add TestJobManager for async test job tracking with progress monitoring - Add ForceSynchronousImport to all AssetDatabase.Refresh() calls to prevent stalls - Mark DomainReloadResilienceTests as [Explicit] with documentation explaining the test infrastructure limitation (internal coroutine waits vs MCP socket polling) - MCP workflow is unaffected - socket messages provide external stimulus that keeps Unity responsive even when backgrounded
- Remove unused Newtonsoft.Json.Linq import from TestJobManager - Add throttling to SessionState persistence (once per second) to reduce overhead - Critical job state changes (start/finish) still persist immediately - Fix duplicate XML summary tag in DomainReloadResilienceTests
- Add run_tests_async and get_test_job to main README tools list - Document background stall limitation for domain reload tests in DEV readme
Run [Explicit] domain_reload tests in their own job using -testCategory
Combines into single job with two test steps to reuse cached Library
- Fix TOCTOU race in TestJobManager.StartJob (single lock scope for check-and-set) - Store TestRunnerApi reference with HideAndDontSave to prevent GC/serialization issues
- run_tests_async is now marked as preferred for long-running suites - run_tests description notes it blocks and suggests async alternative
…sults - manage_asset, manage_gameobject, manage_scene now check preflight return value and propagate busy/retry signals to clients (fixes Sourcery #1) - TestJobManager.FinalizeCurrentJobFromRunFinished now sets job status to Failed when resultPayload.Failed > 0, not always Succeeded (fixes Sourcery #2)
When 'Force fresh server install' is enabled, uvx uses --no-cache --refresh which rebuilds the package and takes significantly longer to start. - Increase timeout from 10s to 45s when dev mode is enabled - Add informative log message explaining the longer startup time - Show actual timeout value in warning message
Apply same logic as FinalizeCurrentJobFromRunFinished: check result.Failed > 0 to correctly mark jobs as Failed when tests fail, even in the fallback path when RunFinished callback is not delivered.
WalkthroughThe pull request adds enhanced retry and reload handling across the Unity MCP transport layer. Changes include a new meta file for a configurator, a response reason extraction utility in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR addresses a critical performance issue where generic
The changes maintain backward compatibility by keeping Confidence Score: 5/5
Sequence DiagramsequenceDiagram
participant Client
participant send_command_with_retry
participant Unity as Unity Editor
participant _extract_response_reason
Client->>send_command_with_retry: send_command_with_retry(command)
send_command_with_retry->>send_command_with_retry: Get max_wait_s from UNITY_MCP_RELOAD_MAX_WAIT_S (2.0s)
send_command_with_retry->>Unity: conn.send_command()
Unity-->>send_command_with_retry: response (reloading)
send_command_with_retry->>_extract_response_reason: Extract reason from response
_extract_response_reason-->>send_command_with_retry: "reloading"
alt Is Reloading AND within time budget
loop While _is_reloading_response(response) AND retries < max_retries AND elapsed < max_wait_s
send_command_with_retry->>send_command_with_retry: Clamp sleep to [50-250]ms
send_command_with_retry->>send_command_with_retry: Sleep(sleep_ms)
send_command_with_retry->>Unity: conn.send_command()
Unity-->>send_command_with_retry: response (still reloading)
send_command_with_retry->>_extract_response_reason: Update reason
end
else Time budget exceeded
send_command_with_retry-->>Client: MCPResponse(hint=retry, data.reason=reloading)
else Reload completed
send_command_with_retry-->>Client: Final response from Unity
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
Server/src/services/tools/refresh_unity.py (1)
50-56: Minor redundancy in type check.Line 51 has a redundant
isinstance(data, dict)check since line 50 already guaranteesdatais a dict (either fromresponse.get("data")if it's a dict, or{}).The recovery logic correctly avoids setting
recovered_from_disconnect = Truefor reload or no-session scenarios.🔎 Suggested simplification
data = response.get("data") if isinstance(response.get("data"), dict) else {} - reason = data.get("reason") if isinstance(data, dict) else None + reason = data.get("reason")Server/src/transport/plugin_hub.py (1)
364-370: Blind exception catch is acceptable here but could be narrower.The static analysis flags
except Exception(BLE001). While this is typically discouraged, here it's a reasonable defensive pattern for environment variable parsing where the only goal is to fall back to a default. However, narrowing toValueErrorwould be more precise sincefloat()only raisesValueErrorfor invalid strings.🔎 Narrower exception handling
try: max_wait_s = float( os.environ.get("UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S", "2.0")) - except Exception: + except (TypeError, ValueError): max_wait_s = 2.0Server/src/transport/legacy/unity_connection.py (2)
754-759: Same blind exception pattern as plugin_hub.py.For consistency, consider narrowing the exception type here as well.
🔎 Narrower exception handling
try: max_wait_s = float(os.environ.get( "UNITY_MCP_RELOAD_MAX_WAIT_S", "2.0")) - except Exception: + except (TypeError, ValueError): max_wait_s = 2.0
850-860: Consider preserving more error context.The
except Exceptionon line 859 catches all exceptions and converts them to a genericMCPResponse(success=False, error=str(e)). This loses exception type information that might be useful for callers to distinguish between:
- Connection errors (retryable)
- Configuration errors (not retryable)
- Unexpected bugs (should surface)
Consider either re-raising certain exception types or including the exception class name in the error message.
🔎 Include exception type in error
except Exception as e: - return MCPResponse(success=False, error=str(e)) + return MCPResponse(success=False, error=f"{type(e).__name__}: {e}")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
MCPForUnity/Editor/Clients/Configurators/CherryStudioConfigurator.cs.metaServer/src/services/tools/refresh_unity.pyServer/src/transport/legacy/unity_connection.pyServer/src/transport/plugin_hub.py
🧰 Additional context used
🧬 Code graph analysis (1)
Server/src/transport/plugin_hub.py (1)
Server/tests/integration/test_helpers.py (1)
model_dump(10-13)
🪛 Ruff (0.14.10)
Server/src/transport/legacy/unity_connection.py
757-757: Do not catch blind exception: Exception
(BLE001)
Server/src/transport/plugin_hub.py
368-368: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (7)
MCPForUnity/Editor/Clients/Configurators/CherryStudioConfigurator.cs.meta (1)
1-11: LGTM!The Unity metadata file is properly formatted with the correct structure and standard MonoImporter configuration. The guid is valid and all required fields are present.
Server/src/transport/plugin_hub.py (3)
453-468: LGTM - Graceful handling for unavailable Unity session.The error handling correctly catches the "no Unity plugins connected" case and returns a structured retry response with
reason: "no_unity_session". This aligns with the PR's goal of providing fast retry signals instead of blocking.Note: String-based exception matching (line 456) is fragile but acceptable here since both the raise and catch are in the same file. Consider defining a custom exception class in a future refactor for better type safety.
475-501: LGTM - Bounded readiness probe for fast-path commands.The ping-based readiness check correctly:
- Excludes
pingitself to avoid recursion- Clamps the wait window to a reasonable 0–30s range
- Uses the
while...elsepattern to return a retry hint when the deadline expiresThis prevents long blocking waits when Unity is unfocused or reloading.
414-427: Good use of parameterized logging.The switch from f-strings to
%-style logging is a best practice—it defers string formatting until the log record is actually emitted, avoiding unnecessary work when the log level is filtered out.Server/src/transport/legacy/unity_connection.py (3)
689-714: Well-structured reason extraction logic.The
_extract_response_reasonfunction correctly handles:
MCPResponseobjects withdata.reasonattribute- Raw dict payloads with
stateordata.reasonfields- Fallback to checking
message/errorfor "reload" substringThis centralizes the logic that was previously scattered across multiple checks.
763-799: LGTM - Enhanced retry loop with bounded waiting.The retry logic correctly:
- Tracks wait start time and enforces
max_wait_sbudget- Extracts
retry_after_msfrom response data for dynamic backoff- Clamps sleep duration to [50ms, 250ms] to prevent both busy-spinning and excessive delays
- Provides detailed debug logging for troubleshooting
Minor observation:
reasonis extracted on line 764 and 799 for logging, while_is_reloading_responseon line 765 re-extracts internally. This is slightly redundant but keeps the code readable.
801-825: LGTM - Proper handling when reload wait budget is exceeded.When the bounded wait expires while Unity is still reloading, the function correctly returns an
MCPResponsewith:
hint: "retry"for client-side retry logicdata.reason: "reloading"for reason-based flow in callers likerefresh_unityretry_after_msclamped to [50, 250] for reasonable backoffThis prevents the ~10s stalls mentioned in the PR objectives.
Address code review feedback: - Catch ValueError specifically (instead of broad Exception) when parsing UNITY_MCP_RELOAD_MAX_WAIT_S, UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S, and UNITY_MCP_SESSION_READY_WAIT_SECONDS, with logging for easier diagnosis of misconfiguration - Normalize reason values to lowercase in _extract_response_reason() to avoid case-sensitive mismatches in comparisons - Simplify refresh_unity.py by removing redundant isinstance check and reusing _extract_response_reason instead of duplicating reason parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
Server/src/transport/legacy/unity_connection.py (1)
772-833: LGTM! Bounded retry logic is well-structured with comprehensive logging.The implementation correctly:
- Initializes the wait timer on the first reload response
- Enforces the
max_wait_sbudget by checking elapsed time- Respects Unity's
retry_after_mshints while clamping sleep to[50, 250]ms- Provides detailed debug logging at each stage (start, retry, completion, budget exceeded)
- Returns a structured
MCPResponsewhen the wait budget is exceededOptional: Consider rounding instead of truncating delay_ms.
Line 796 converts
delay_mstoint(), which truncates rather than rounds. For consistency with typical time handling:- sleep_ms = max(50, min(int(delay_ms), 250)) + sleep_ms = max(50, min(round(delay_ms), 250))This is a minor refinement and unlikely to impact behavior materially.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Server/src/services/tools/refresh_unity.pyServer/src/transport/legacy/unity_connection.pyServer/src/transport/plugin_hub.py
🧰 Additional context used
🧬 Code graph analysis (2)
Server/src/transport/legacy/unity_connection.py (1)
Server/src/transport/plugin_hub.py (1)
send_command(133-210)
Server/src/services/tools/refresh_unity.py (1)
Server/src/transport/legacy/unity_connection.py (2)
async_send_command_with_retry(837-869)_extract_response_reason(689-719)
🔇 Additional comments (5)
Server/src/services/tools/refresh_unity.py (2)
13-13: LGTM!The imports are correctly used:
async_send_command_with_retryis passed to the transport layer and_extract_response_reasonis used to extract failure reasons for refined retry handling.
50-55: LGTM! Improved retry classification.The refined logic correctly distinguishes true disconnect recovery from transient busy states (
"reloading","no_unity_session"). This prevents false-positive recovery signals and aligns with the PR's goal of accurate retry classification.Server/src/transport/plugin_hub.py (2)
405-443: LGTM! Well-structured bounded wait with good observability.The bounded waiting logic is correctly implemented with clear debug logging that includes instance context. The wait deadline is enforced properly, and the logging will help diagnose session resolution issues.
479-509: LGTM! Fast-fail readiness probe is well-implemented.The bounded readiness probe correctly:
- Clamps
UNITY_MCP_SESSION_READY_WAIT_SECONDSto[0, 30]- Handles invalid configuration values gracefully
- Uses ping to verify main-thread readiness
- Returns a structured retry response when the session isn't ready
Server/src/transport/legacy/unity_connection.py (1)
689-728: LGTM! Robust reason extraction with good type safety.The
_extract_response_reasonfunction correctly:
- Handles both
MCPResponseand raw dict payloads- Safely extracts reasons from nested
data.reasonfields- Falls back to checking message/error text for "reload" keywords
- Returns lowercase strings for case-insensitive comparison by callers
- Uses safe attribute access and None-coalescing
The
_is_reloading_responsewrapper provides a clean boolean interface.
- Add upper bound (30s) to UNITY_MCP_RELOAD_MAX_WAIT_S to prevent misconfiguration from causing excessive waits - Add upper bound (30s) to UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S for consistency with readiness probe - Introduce NoUnitySessionError custom exception to replace fragile string matching in send_command_for_instance Addresses code review feedback for bounded retry policy PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
Server/src/transport/legacy/unity_connection.py (2)
791-796: Consider handling MCPResponse for retry_after_ms extraction.The
retry_after_msextraction only handlesdictresponses. Sinceconn.send_commandcan returnMCPResponse(line 242), responses of that type won't have theirdata.retry_after_msinspected here.While this may not be an issue in practice (Unity responses are dicts, MCPResponse is for local errors), aligning with
_extract_response_reason's dual handling would be more consistent.🔎 Optional: extend retry_after_ms extraction to MCPResponse
delay_ms = retry_ms - if isinstance(response, dict): - retry_after = response.get("retry_after_ms") - if retry_after is None and isinstance(response.get("data"), dict): - retry_after = response["data"].get("retry_after_ms") - if retry_after is not None: - delay_ms = int(retry_after) + retry_after = None + if isinstance(response, dict): + retry_after = response.get("retry_after_ms") + if retry_after is None and isinstance(response.get("data"), dict): + retry_after = response["data"].get("retry_after_ms") + elif isinstance(response, MCPResponse): + data = getattr(response, "data", None) + if isinstance(data, dict): + retry_after = data.get("retry_after_ms") + if retry_after is not None: + delay_ms = int(retry_after)
860-870: Broad exception swallowing may hide important errors.The sync
send_command_with_retrypropagates exceptions (e.g.,ConnectionError,ValueError), but this async wrapper catches all exceptions and converts them toMCPResponse. This asymmetry could mask programming errors or connection issues that callers might want to handle differently.🔎 Suggested: be more selective about caught exceptions
try: import asyncio # local import to avoid mandatory asyncio dependency for sync callers if loop is None: loop = asyncio.get_running_loop() return await loop.run_in_executor( None, lambda: send_command_with_retry( command_type, params, instance_id=instance_id, max_retries=max_retries, retry_ms=retry_ms), ) - except Exception as e: + except (ConnectionError, TimeoutError, OSError) as e: + # Convert recoverable transport errors to MCPResponse return MCPResponse(success=False, error=str(e)) + # Let programming errors (ValueError, TypeError, etc.) propagateAlternatively, if broad catching is intentional for resilience, consider logging at warning/error level so issues are observable.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Server/src/transport/legacy/unity_connection.pyServer/src/transport/plugin_hub.py
🧰 Additional context used
🧬 Code graph analysis (1)
Server/src/transport/plugin_hub.py (1)
Server/tests/integration/test_helpers.py (3)
warning(39-40)error(46-47)model_dump(10-13)
🪛 Ruff (0.14.10)
Server/src/transport/plugin_hub.py
451-451: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (10)
Server/src/transport/legacy/unity_connection.py (4)
689-719: LGTM! Clean extraction logic with proper normalization.The helper correctly extracts and normalizes reason strings from both
MCPResponseobjects and raw dicts. The lowercase normalization enables consistent case-insensitive comparisons.
722-728: LGTM! Improved specificity over the previous broad check.The refactor to use
_extract_response_reasonmakes the reloading detection more explicit and maintainable. Only"reloading"reasons or messages containing "reload" are now treated as reloads, avoiding misclassification of unrelated conditions.
759-769: Environment variable handling is well-implemented.Good defensive coding with
ValueErrorhandling, warning logs for invalid values, and the[0, 30]clamp to prevent misconfiguration.
820-828: Structured retry response on budget exhaustion looks good.The fallback
MCPResponsecorrectly signals a retryable state withhint="retry"and includesreasonand boundedretry_after_msin data, enabling downstream consumers (likerefresh_unity) to handle these responses appropriately.Server/src/transport/plugin_hub.py (6)
37-39: Good addition of custom exception type.This replaces fragile string matching with a proper exception type, making the error handling in
send_command_for_instancemore robust and maintainable.
368-381: Environment variable handling is well-implemented and consistent.Good defensive coding with:
ValueErrorhandling with warning logs- Upper bound clamping to
[0, 30]to prevent misconfiguration- Sleep clamping to
[0.05, 0.25]aligns with unity_connection.py's 50-250ms range
410-436: LGTM! Improved deadline-based waiting with clear logging.Using
time.monotonic()for deadline calculation is correct for measuring elapsed time. The logging clearly indicates wait start, restoration, and the instance context.
443-451: Correctly raises custom exception after wait exhaustion.The static analysis hint (TRY003) about message placement is a minor style concern—the message is descriptive and not excessively long. The current approach is acceptable.
462-475: Clean exception handling with structured retry response.The
NoUnitySessionErrorhandler correctly returns a retryableMCPResponsewithreason: "no_unity_session"andretry_after_ms: 250, enablingrefresh_unityto avoid misclassifying this as a recovered disconnect.
485-490: Consistent env var validation pattern.The error handling mirrors the pattern used for
UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S, maintaining consistency across the codebase.
Motivation
hint="retry"responses as full Unity reloads.tests_runningdo not trigger reload wait loops.refresh_unitydoes not mark reload/no-session retries as recovered-from-disconnect.Description
_is_reloading_responselogic with_extract_response_reasonand a stricter_is_reloading_responsethat only treats explicitreloadingreasons or messages containing "reload" as reloads.UNITY_MCP_RELOAD_MAX_WAIT_Sand micro-retry clamping (50–250ms), with debug logging and a fallbackMCPResponsebusy/retry payload when the budget is exceeded.PluginHub._resolve_session_idusingUNITY_MCP_SESSION_RESOLVE_MAX_WAIT_Sand return a retryable busy response (data.reason: "no_unity_session") instead of waiting the previous long default and raising.refresh_unityto inspectresponse.data.reasonand avoid settingrecovered_from_disconnect=Truefor"reloading"or"no_unity_session"retry responses.Testing
send_command_with_retryandPluginHub._resolve_session_idto validate fast-fail behavior and classification (not included here).Codex Task
Summary by CodeRabbit
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.