Fix multi-agent bridge bugs and watchdog timeout for workers by PureWeen · Pull Request #195 · PureWeen/PolyPilot

PureWeen · 2026-02-23T15:07:04Z

Summary

Fixes 6 bugs discovered while testing multi-agent PR review orchestration on desktop and mobile.

Bridge fixes (mobile)

Unblock WebSocket message loop: SendMessage handler was awaiting SendPromptAsync (blocks until full response), preventing all other client messages. Now fire-and-forget via Task.Run.
Prevent history overwrite: SyncRemoteSessions was overwriting incrementally-built streaming history with stale cache on TurnEnd. Now requests fresh history before clearing the streaming guard.
Stop IsProcessing race: SyncRemoteSessions unconditionally overwrote IsProcessing from periodic sessions list, racing with event-driven TurnStart/TurnEnd. Now skips processing state updates for actively streaming sessions.
Broadcast organization state: OnStateChanged only sent SessionsList, not OrganizationState, so mobile never saw group/role changes.

Multi-agent orchestration fixes

Worker dispatch regex: ParseTaskAssignments regex (\S+) only captured first word of worker names with spaces (e.g. "PR Review Squad-worker-1"). Changed to ([^\n]+?).
Preset re-creation: Role/group/model assignment was inside the same try block as CreateSessionAsync, so recreating an existing Squad skipped all assignments.

Watchdog timeout fix

Multi-agent workers killed prematurely: The 120s inactivity timeout was firing before text-heavy workers (PR reviews, no tools) completed. Responses were lost because CompleteResponse skipped when IsProcessing was already false. Now caches IsMultiAgentSession on SessionState at send time (thread-safe) and uses the 600s timeout.

Tests

6 new tests for worker name parsing and multi-agent watchdog behavior
All 1,187 tests pass

Review

Fix reviewed by Opus 4.6, Sonnet 4.5, and GPT-5.2 — all agreed on the thread-safety fix (cache multi-agent flag at send time vs. reading Organization lists from background thread).

…top IsProcessing race Three fixes for mobile bridge reliability: 1. WsBridgeServer: Fire-and-forget SendPromptAsync in SendMessage handler. The handler was awaiting ResponseCompletion which blocks for the entire response duration (minutes), preventing abort/switch/new messages from being processed by the per-client WebSocket read loop. 2. CopilotService.Bridge: On TurnEnd, request fresh history before clearing the streaming guard. Previously, removing from _remoteStreamingSessions immediately allowed SyncRemoteSessions to overwrite incrementally-built history with a stale SessionHistories cache, losing the last message. 3. CopilotService.Bridge: Skip IsProcessing updates from SessionsList for sessions that are actively streaming. The periodic sessions list could race with event-driven TurnStart/TurnEnd, causing stop button flicker. Also fixes: ParseTaskAssignments regex now captures worker names with spaces (e.g. 'PR Review Squad-worker-1') instead of only the first word. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

OnStateChanged only broadcasted SessionsList, not OrganizationState. This caused mobile to have stale group assignments — sessions moved between groups on desktop wouldn't update on mobile until a specific org-triggering operation occurred. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

CreateGroupFromPresetAsync had role/group/model assignment inside the same try block as CreateSessionAsync. If the session already existed (e.g. recreating the same Squad team), CreateSessionAsync threw and the orchestrator lost its Orchestrator role, workers lost their group assignment and system prompts. Move assignment outside the try so it runs regardless of whether session creation succeeded or was skipped. Also adds 3 tests for ParseTaskAssignments with worker names containing spaces (the regex fix from the prior commit). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Workers doing text-heavy tasks (e.g., PR reviews) can take 2-4 minutes without tool calls. The 120s inactivity timeout was killing workers prematurely — the watchdog cleared IsProcessing and added a 'stuck' warning, then the actual response arrived but CompleteResponse skipped because IsProcessing was already false, losing the response. Now sessions in multi-agent groups use the 600s tool-execution timeout. The multi-agent flag is cached on SessionState at send time (UI thread) so the watchdog can read it safely from its background thread without accessing the Organization lists (plain List<T>, UI-thread-only). The orchestration loop already has its own 10-minute per-worker timeout via CancelAfter, so the watchdog is a safety net, not the primary guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen and others added 4 commits February 23, 2026 03:32

PureWeen merged commit e2d6c70 into main Feb 23, 2026

PureWeen deleted the MultiReviewTest branch February 23, 2026 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multi-agent bridge bugs and watchdog timeout for workers#195

Fix multi-agent bridge bugs and watchdog timeout for workers#195
PureWeen merged 4 commits intomainfrom
MultiReviewTest

PureWeen commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Feb 23, 2026

Summary

Bridge fixes (mobile)

Multi-agent orchestration fixes

Watchdog timeout fix

Tests

Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant