Skip to content

Conversation

@Kriys94
Copy link
Contributor

@Kriys94 Kriys94 commented Oct 24, 2025

Explanation

Current State

The BackendWebSocketService tests were experiencing flakiness after introducing the new ExponentialBackoff logic using Cockatiel. The random jitter in the exponential backoff algorithm was causing non-deterministic test behavior, particularly in tests that relied on precise timing control with jest.advanceTimersByTime().

Symptoms:

  • Tests would randomly fail with state mismatches (e.g., expecting "error" but receiving "connecting")
  • Reconnection attempt counts would be inconsistent
  • Coverage would fluctuate between runs (sometimes 99.39%, sometimes 100%)
  • Specific lines (456, 552-553) would intermittently miss coverage

Solution

This PR fixes all flaky tests by making the exponential backoff behavior deterministic and adjusting test timing to prevent race conditions.

1. Added Math.random() Mock to 9 Tests

Mocked Math.random() to return 0 in tests that involve reconnection logic. This ensures Cockatiel's exponential backoff delays are predictable and consistent across test runs:

  • should handle abnormal WebSocket close by triggering reconnection
  • should reset reconnect attempts after stable connection
  • should skip connect when reconnect timer is already scheduled
  • should handle WebSocket onclose during connection phase
  • should clear connection timeout in handleClose when timeout occurs then close fires
  • should not schedule multiple reconnects when scheduleReconnect called multiple times
  • should force reconnection and schedule connect
  • should skip forceReconnection when reconnect timer is already scheduled
  • should stop reconnection when isEnabled returns false during scheduled reconnect

2. Timing Adjustments to Prevent Race Conditions

Changed completeAsyncOperations(10) to completeAsyncOperations(0) in 6 tests to check state immediately after operations without allowing timers to execute prematurely:

  • should handle WebSocket onclose during connection phase - After simulateClose
  • should clear connection timeout in handleClose when timeout occurs then close fires - After manual close trigger
  • should not schedule multiple reconnects when scheduleReconnect called multiple times - After both close events
  • should force reconnection and schedule connect - After forceReconnection call
  • should skip forceReconnection when reconnect timer is already scheduled - After simulateError
  • should skip connect when reconnect timer is already scheduled - After simulateClose

3. Coverage Stability

Fixed intermittent coverage issues for:

  • Line 456: Early return in connect() when reconnect timer is already scheduled
  • Lines 552-553: Early return in forceReconnection() when reconnect timer is already scheduled

These lines are now consistently covered by ensuring reconnect timers remain scheduled during assertions.

Technical Details

Exponential Backoff with Jitter:

  • Cockatiel uses Math.random() to add jitter to prevent thundering herd problems
  • Without mocking, delays vary unpredictably: e.g., 500ms could become 250-750ms
  • With Math.random() = 0, delays are deterministic and tests can advance time precisely

Timing Philosophy:

  • completeAsyncOperations(0): Flush promises only, no time advancement
  • Used when we want to check state immediately after scheduling timers
  • Prevents race conditions where timers execute before assertions run

Testing

Verified stability with extensive testing:

  • Individual flaky tests run 100 times each: ✅ All passed
  • Full test suite run 100 times: ✅ Consistent 100% coverage
  • All tests pass reliably without flakiness

References

  • Related to ExponentialBackoff implementation in BackendWebSocketService
  • Ensures reliable CI/CD test execution
  • Fixes intermittent test failures that were blocking development

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed, highlighting breaking changes as necessary
  • I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

Note

Stabilizes BackendWebSocketService reconnection tests by mocking Math.random jitter and using immediate async flushing to avoid timer races, with minor state/idempotency assertions added.

  • Tests (packages/core-backend/src/BackendWebSocketService.test.ts)
    • Mock Math.random() to 0 in reconnection-related tests to make Cockatiel backoff jitter deterministic.
    • Replace completeAsyncOperations(10) with completeAsyncOperations(0) where assertions must occur before timers run (e.g., after simulateClose/simulateError, post-forceReconnection).
    • Add targeted assertions and checks:
      • Verify CONNECTING state during connection-phase close handling.
      • Ensure early-return/idempotency when reconnect timer exists (connect/forceReconnection, multiple scheduleReconnect calls).
    • Result: deterministic timing, eliminated flakiness, stable reconnect attempt counts and coverage.

Written by Cursor Bugbot for commit 6333eda. This will update automatically on new commits. Configure here.

@Kriys94 Kriys94 marked this pull request as ready for review October 24, 2025 13:47
@Kriys94 Kriys94 requested review from a team as code owners October 24, 2025 13:47
// Close during connection phase
mockWs.simulateClose(1006, 'Connection failed');
await completeAsyncOperations(10);
await completeAsyncOperations(0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: WebSocket Test Flakiness Due to Timer Advancement

The 'should handle WebSocket onclose during connection phase' test advances timers by 10ms before asserting the CONNECTING state. This introduces a race condition, as the connection could progress beyond CONNECTING during this time, leading to flakiness and undermining the PR's goal of deterministic timing.

Fix in Cursor Fix in Web

@Kriys94 Kriys94 merged commit 77db789 into main Oct 24, 2025
257 checks passed
@Kriys94 Kriys94 deleted the fix/flakytest branch October 24, 2025 13:50
Gudahtt added a commit that referenced this pull request Oct 24, 2025
…r/multichain-transactions-controller

* origin/main: (35 commits)
  feat: `JsonRpcEngineV2` (#6176)
  Release 641.0.0 (#6940)
  feat: Add transaction emulation actions (#6935)
  Release/640.0.0 (#6934)
  fix(core-backend): control randomness to fix flaky test (#6936)
  chore: Add `@metamask-previews/*` to NPM age gate exceptions (#6937)
  Release/639.0.0 (#6931)
  feat: make getCryptoApproveTransactionParams synchronous (#6930)
  feat: add new actions to `KeyringController` (#6928)
  feat: add `getAccounts` to `AccountsController` (#6927)
  chore: remove `Monad Mainnet` single call balance contract and add into account v4 (#6929)
  Release/638.0.0 (#6923)
  fix: Downgrade `multiformats` to `^9.9.0` to avoid ESM-only dependency (#6920)
  Release/637.0.0 (#6919)
  feat(account-tree-controller): add callbacks for hidden and pinned data (#6910)
  Release 636.0.0 (#6918)
  fix(core-backend): reconnection logic (#6861)
  fix:  Tx state listener and signature coverage (#6906)
  Release/635.0.0 (#6917)
  fix(base-controller): add TypeScript declaration file for legacy module resolution (#6915)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants