Skip to content

Conversation

@sirtimid
Copy link
Contributor

@sirtimid sirtimid commented Feb 4, 2026

Closes #806

Summary

  • Add clearRemoteSeqState method to kernel store for clearing sequence state without removing the remote relationship
  • Add handlePeerRestart method to RemoteHandle that resets all state when a peer restarts with a new incarnation ID
  • Add OnIncarnationChange callback type and wire it through the transport layer, platform services, and RemoteManager
  • Clear permanent failure status on user-initiated sends to allow reconnection to previously-failed peers
  • Add RPC handler for browser runtime to notify kernel of incarnation changes

Behavior

Scenario Behavior
Send to permanently-failed peer Clear failure, dial, handshake, proceed
Same incarnation ID Normal operation, pending messages may still be ACKed
Different incarnation ID Reset RemoteHandle state, reject pending, start fresh

Test plan

  • Unit tests for clearRemoteSeqState in store
  • Unit tests for handlePeerRestart in RemoteHandle
  • Unit tests for incarnation change callback wiring
  • Unit tests for clearing permanent failure on send
  • All existing tests pass

🤖 Generated with Claude Code


Note

Medium Risk
Touches core remote-transport/handshake and message sequencing/persistence; bugs here could cause dropped/rejected messages or reconnection regressions, though changes are gated behind explicit incarnation-change detection and are well-covered by tests.

Overview
Remote comms now distinguishes “peer restart” from “connection give-up”. A new OnIncarnationChange callback is plumbed end-to-end (kernel RemoteManagerinitRemoteCommsPlatformServicesinitTransport, plus browser RPC remoteIncarnationChange) so the kernel can reset a remote when the handshake detects a changed incarnation ID.

Remote state is reset on restart instead of treated as a permanent failure. RemoteHandle.handlePeerRestart() rejects pending work, clears timers, resets seq/ack counters, and clears persisted per-remote sequence/pending-message state via new store method clearRemoteSeqState, allowing fresh messaging after a peer reboot.

Transport reconnection/send behavior is adjusted for restarts. Outbound handshake now returns { success, incarnationChanged }, reconnection logic consumes this, user-initiated sends clear “permanently failed” status before dialing, and sends abort when an incarnation change is detected to avoid delivering stale queued messages.

Tests are updated/added across kernel store, remote handle/manager, transport lifecycle, nodejs+browser platform services, and e2e coverage; .playwright-mcp/ is added to .gitignore.

Written by Cursor Bugbot for commit 94e4569. This will update automatically on new commits. Configure here.

…nation ID detection

- Add `clearRemoteSeqState` method to kernel store for clearing sequence state
  without removing the remote relationship
- Add `handlePeerRestart` method to RemoteHandle that resets all state when a
  peer restarts with a new incarnation ID
- Add `OnIncarnationChange` callback type and wire it through the transport
  layer, platform services, and RemoteManager
- Clear permanent failure status on user-initiated sends to allow reconnection
  to previously-failed peers
- Add RPC handler for browser runtime to notify kernel of incarnation changes
- Update all test files to account for new callback parameters

This enables proper handling when a peer restarts: the incarnation ID handshake
detects the restart, and the callback resets RemoteHandle state (sequence numbers,
pending messages) for a fresh start.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sirtimid sirtimid requested a review from a team as a code owner February 4, 2026 15:11
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 78.21%
⬆️ +0.07%
6131 / 7839
🔵 Statements 78.18%
⬆️ +0.08%
6230 / 7968
🔵 Functions 76.5%
⬇️ -0.07%
1563 / 2043
🔵 Branches 78.32%
⬇️ -0.06%
2226 / 2842
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/kernel-browser-runtime/src/PlatformServicesClient.ts 92.3%
⬇️ -4.36%
75.86%
⬇️ -5.62%
85%
⬇️ -4.47%
92.3%
⬇️ -4.36%
110, 132, 314-317
packages/kernel-browser-runtime/src/PlatformServicesServer.ts 93.02%
⬇️ -2.21%
88.88%
🟰 ±0%
73.91%
⬇️ -7.04%
93.02%
⬇️ -2.21%
140, 163, 194, 405-422
packages/nodejs/src/kernel/PlatformServices.ts 93.97%
🟰 ±0%
90.47%
🟰 ±0%
88.23%
🟰 ±0%
93.97%
🟰 ±0%
144-147, 185, 212-217
packages/ocap-kernel/src/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts 87.89%
⬆️ +3.47%
82.14%
⬆️ +0.44%
86.95%
⬆️ +2.51%
88.18%
⬆️ +3.06%
338, 357-400, 453, 492, 499-501, 542-555, 898
packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts 98.55%
⬆️ +0.25%
100%
🟰 ±0%
100%
🟰 ±0%
98.55%
⬆️ +0.25%
133
packages/ocap-kernel/src/remotes/kernel/remote-comms.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/platform/reconnection-lifecycle.ts 88.42%
🟰 ±0%
90.24%
🟰 ±0%
80%
🟰 ±0%
88.42%
🟰 ±0%
123-127, 147-148, 246-251, 299-300
packages/ocap-kernel/src/remotes/platform/transport.ts 82.58%
⬇️ -1.36%
80.72%
⬇️ -1.20%
75.86%
🟰 ±0%
82.58%
⬇️ -1.36%
113, 153-154, 159-163, 205-214, 247, 281-299, 323, 407, 451-454, 478, 502-507, 510-511, 515-518, 559, 589, 608-610, 619
packages/ocap-kernel/src/rpc/kernel-remote/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/rpc/kernel-remote/remoteIncarnationChange.ts 60% 100% 0% 60% 40-41
packages/ocap-kernel/src/store/methods/remote.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
Generated in workflow #3563 for commit f473710 by the Vitest Coverage Report Action

…hange

When a message triggers dial→handshake and incarnation change is detected,
the message was still being sent with stale content (old seq number, old
object/promise references). This could cause errors on the remote peer.

Changes:
- Update doOutboundHandshake to return { success, incarnationChanged }
- Throw error when incarnation changes during sendRemoteMessage to prevent
  stale message delivery
- Update reconnection-lifecycle to handle new handshake return type
- Fix outdated comment in e2e test (onRemoteGiveUp → onIncarnationChange)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

The handleIncarnationChange callback was missing kernel promise rejection.
When a remote peer restarts, any pending kernel promises that were waiting
on that peer need to be rejected since the peer has lost its state and
won't be able to resolve them.

This mirrors the behavior in handleRemoteGiveUp and ensures consistent
promise handling for all remote failure modes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote comms: Handle reconnection to previously-dead peers with incarnation ID detection

1 participant