Fix: session response lost when IsProcessing cleared prematurely#158
Merged
PureWeen merged 2 commits intoPureWeen:mainfrom Feb 20, 2026
Conversation
When IsProcessing is cleared prematurely (by watchdog timeout, SessionErrorEvent, or reconnect race), accumulated delta content in CurrentResponse was silently lost. The next SessionIdleEvent would skip CompleteResponse because IsProcessing was already false, leaving the UI with no response despite the SDK completing the work. Fixes: - CompleteResponse: flush CurrentResponse even when IsProcessing is already false, so late-arriving SessionIdleEvent still preserves accumulated content - Watchdog timeout: call FlushCurrentResponse before clearing IsProcessing - SessionErrorEvent: call FlushCurrentResponse before clearing IsProcessing - Reconnect path in SendPromptAsync: increment ProcessingGeneration on the new SessionState before retrying SendAsync, preventing replayed SessionIdleEvent from the old turn from interfering with the retry turn's completion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4a7f256 to
99ed9b1
Compare
The fallback path in LaunchFixIt() used 'git checkout -b <branch>' which creates the new branch from whatever commit HEAD points to. If the repo was on a feature branch, the fix branch would inherit unrelated commits, polluting the PR. Changes: - LaunchFixIt fallback: detect upstream default branch (upstream/main, origin/main, etc.) and base the new branch on it - BuildCopilotPrompt: add step 8 instructing Copilot to verify branch ancestry before pushing - copilot-instructions.md: add 'verify branch is based on main' rule Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f28ef76 to
b38e6e5
Compare
PureWeen
added a commit
that referenced
this pull request
Feb 21, 2026
Invariant audit found 4 pre-existing gaps where IsProcessing was cleared without resetting all companion fields: 1. CompleteResponse: missing ActiveToolCallCount reset 2. SendAsync reconnect+retry failure: missing IsResumed, HasUsedToolsThisTurn, ActiveToolCallCount, FlushCurrentResponse 3. SendAsync initial failure: same missing fields 4. Remote mode abort: missing IsResumed These are the exact same class of bug that caused regressions in PRs #148, #158, and #164. Every path that sets IsProcessing=false must also clear: IsResumed, HasUsedToolsThisTurn, ActiveToolCallCount, ProcessingStartedAt, ToolCallCount, ProcessingPhase, and FlushCurrentResponse. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Feb 21, 2026
…edge Add comprehensive documentation of the recurring stuck-session bug pattern (7 PRs, 16 fix/regression cycles) to copilot-instructions.md: - Full cleanup checklist for all IsProcessing=false paths - Table of all 7 paths with locations - 7 common mistakes with PR references where each occurred - Staleness check and IsResumed clearing documentation - Cross-thread volatile field requirements - ProcessingGeneration guard explanation - Watchdog diagnostic log tag additions This knowledge was hard-won across PRs #141, #147, #148, #153, #158, #163, #164 and should prevent future regressions by making the invariants explicit and discoverable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Feb 25, 2026
## Problem After app restart, resumed sessions that were mid-turn show **Thinking...** with a Stop button. The user must manually click Stop every time. The existing watchdog waited 600s (10 min!) before clearing stuck IsProcessing. ## Solution Add a **30s resume quiescence timeout** for sessions that receive zero SDK events after restart. If no events flow within 30s of app start, the session is cleared as stuck. ### Key design decisions (informed by 4-model consultation: Opus 4.6, Sonnet 4.6, Codex 5.3, GPT-5.1): 1. **30s quiescence** — short enough users don't wait, long enough for SDK reconnect (~5s typical, 6x safety margin) 2. **Event-gated** — only fires when \HasReceivedEventsSinceResume == false\. Once events start flowing, transitions to normal 120s/600s timeout tiers 3. **Seed from DateTime.UtcNow, NOT file time** — all 3 models independently flagged that seeding from events.jsonl would cause immediate kills for sessions >15s old (exact PR #148 regression pattern) 4. **Reuses existing watchdog fire path** — no new IsProcessing cleanup code, all 8 invariants preserved ### Timeout tiers (3-tier, was 2-tier): | Condition | Timeout | |-----------|---------| | Resumed, zero events since restart | **30s** (NEW) | | Normal processing, no tools | 120s | | Active tools / resumed with events / multi-agent | 600s | ## Tests - **16 new regression guard tests** covering quiescence edge cases, seed time safety, exhaustive timeout matrix - Updated existing tests to use \ComputeEffectiveTimeout\ helper mirroring production 3-tier formula - **108 total watchdog+recovery tests pass** ✅ ## Regression history context This code has been through 7 PRs of fix/regression cycles (PRs #141→#147→#148→#153→#158→#163→#164). The most relevant precedent: PR #148 added a 10s resume timeout that killed active sessions. Our 30s timeout avoids this by being event-gated and seeded from UtcNow. Fixes the 'click Stop on every restart' UX issue. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
Session shows "Thinking" with no response in the UI, but the SDK/CLI side completed the work and is ready for the next message.
IsProcessingisfalsebut the response content was never flushed to the message history.Root Cause
When
IsProcessingis cleared prematurely (by watchdog timeout,SessionErrorEvent, or reconnect race condition), accumulated delta content inCurrentResponse(StringBuilder) was silently lost. The subsequentSessionIdleEvent→CompleteResponsecall was skipped becauseIsProcessingwas alreadyfalse.Three code paths could trigger this:
IsProcessingbut doesn't flushCurrentResponseIsProcessingwithout flushing accumulated partial responseSessionStatehasProcessingGeneration=0, replayedSessionIdleEventfrom old turn passes generation check (0==0) and clearsIsProcessingbefore retry turn completesFix (Commit 1: response flush)
CurrentResponseeven whenIsProcessingis alreadyfalse(events can arrive after premature clearing)FlushCurrentResponse()before settingIsProcessing = falseFlushCurrentResponse()before settingIsProcessing = falseProcessingGenerationon newSessionStatebefore retrySendAsync, so replayedSessionIdleEventhas stale generation and is skippedFix (Commit 2: branch ancestry)
The "Fix with Copilot" feature created branches from the current HEAD instead of
main, causing PRs to include unrelated commits when the repo happened to be on a feature branch.upstream/main,origin/main, etc.) and base the new branch on itTests
Added
ResponseFlushTests.cswith 15 tests covering: