Add Reflection Cycle (Ralph's Loop) for iterative goal-driven sessions#92
Merged
Add Reflection Cycle (Ralph's Loop) for iterative goal-driven sessions#92
Conversation
Copilot
AI
changed the title
[WIP] Add mechanism for symbolic loop implementation
Add ReflectionCycle: iterative goal-driven prompt refinement mechanism
Feb 13, 2026
4ab977f to
86e2f8f
Compare
6165cdc to
2d5d911
Compare
PureWeen
added a commit
that referenced
this pull request
Feb 17, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0e1e0cd to
7d0337d
Compare
…ment Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
…state in Advance method Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
…rompts
- Replace fragile Contains("Goal complete") with [[RALPH_COMPLETE]] sentinel
detected via line-anchored regex (eliminates false positives in natural prose)
- Add stall detection via Jaccard similarity (>90% threshold) and exact hash
matching over sliding window — stops after 2 consecutive stalls
- Improve follow-up prompt: requires progress assessment, discourages premature
completion, uses "Ralph's Loop" branding
- Update event handler to log stall reason alongside goal-met/max-iterations
- Expand tests from 18 to 31: false-positive guards for old marker and natural
prose, stall detection, sentinel positioning, prompt content validation
- User interruption already handled in EnqueueMessage (cancels active cycle)
Consulted 5 models (Opus, GPT-5, Gemini 3 Pro, Sonnet, GPT-5.1) for design.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssages - Add /reflect slash command: /reflect <goal> [--max N] starts a cycle, /reflect stop cancels it. Goal text is also sent as the initial prompt. - Add status pill in session header: shows '🔄 Reflecting 2/5' with pulse animation while cycle is active, clickable to stop - Add completion system messages in chat: ✅ goal met,⚠️ stalled, ⏱️ max iterations reached — all include the goal text - Rename all RALPH references to REFLECTION (sentinel, branding, docs) - Wire OnStopReflection EventCallback through ExpandedSessionView - Update /help to document the new command Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arnings - Add ChatMessageType.Reflection with purple compact rendering — auto follow-ups now show as '🔄 Iteration 2/5' instead of full prompt text - Don't cancel cycle on user message — queued user messages run alongside the reflection loop, allowing mid-loop steering - Add stall warning system message on first stall detection, before the 2-stall threshold kills the cycle - Add skipHistoryMessage to SendPromptAsync — reflection follow-ups go to the SDK but don't add a verbose user message to History - Persist reflection/stall/completion messages to chat DB - Expose ConsecutiveStalls and ShouldWarnOnStall on ReflectionCycle - Add IsReflectionFollowUpPrompt helper for prompt type detection - 3 new tests (334 total passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntrol
- Rich /reflect help: typing /reflect with no args shows interactive guide
with quick start examples, how-it-works, and all available commands
- Goal text in iteration messages: BuildFollowUpStatus JSON includes summary
field with truncated goal ("🔄 Iteration 2/5 — Fix tests")
- Context usage warnings: yellow at 70%, red at 90% during active reflection
cycle to prevent context exhaustion
- Rich completion summary: BuildCompletionSummary() shows emoji, goal, iteration
count, duration, and outcome (including similarity % for stalls)
- Stall similarity %: CheckStall() now exposes LastSimilarity score, shown in
warning messages ("91% similarity with previous response")
- Progress indicator on pill: reflection pill shows fill progress bar and
truncated goal text, plus paused state indicator
- Pause/resume: /reflect pause and /reflect resume commands for inspecting
progress without cancelling the cycle
- StartedAt/CompletedAt timestamps on ReflectionCycle for duration tracking
- IsPaused state prevents Advance() from incrementing while paused
- 14 new tests (45 total): pause/resume, similarity exposure, completion
summary variants, duration tracking, long goal truncation
- Updated /help to document pause/resume commands
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… busy - Send goal as initial prompt via SendPromptAsync when session is idle, or EnqueueMessage when session is already processing - Add SkipReflectionEvaluationOnce flag on SessionState to prevent evaluating the pre-existing response against the reflection goal - Show queued status message when cycle starts during active processing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Task.Run + delay allowed the copilot agent to race and grab the session before the queued reflection prompt could be sent. Now dispatches immediately on the current synchronization context via fire-and-forget SendPromptAsync, with ContinueWith re-queuing on failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…all reset - Marshal ContinueWith re-queue callback to UI thread via InvokeOnUI - Purge queued reflection follow-up prompts on StopReflectionCycle - Escape session name in JS eval with EscapeForJs helper - Dispatch immediately on resume when session is idle - Add ResetStallDetection() and call on resume to avoid false stalls - Use int.TryParse with clamping for --max to prevent OverflowException Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The tests hardcoded 'copilot' but on Windows the binary is 'copilot.exe'. Added CopilotBinaryName helper and 'win-x64' to alternative RIDs fallback, matching the logic already in CopilotService.GetBundledCliPath(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create a hidden evaluator session (gpt-4.1) on cycle start to judge worker responses independently instead of self-evaluation - Evaluator receives goal + latest response, returns PASS/FAIL format - Worker gets specific evaluator feedback as next iteration prompt - Evaluator session is hidden from sidebar, auto-cleaned on cycle end - Falls back to sentinel-based self-evaluation if evaluator unavailable - Add AdvanceWithEvaluation() to ReflectionCycle model - Add ParseEvaluatorResponse(), BuildEvaluatorPrompt(), BuildFollowUpFromEvaluator() - Add IsHidden to AgentSessionInfo, filter from GetAllSessions() - Add 19 new tests for evaluator parsing and cycle behavior (456 total) - Make StartReflectionCycle async to await evaluator session creation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After async evaluator returns and enqueues a follow-up prompt, the session is already idle (CompleteResponse finished). Explicitly dispatch the queued message via Task.Run + sync context post when IsProcessing is false. Also show evaluator PASS verdict in chat on cycle completion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Filter hidden sessions from ReconcileOrganization to prevent them appearing in sidebar groups - Filter hidden sessions from SaveActiveSessionsToDisk to prevent them being restored on app restart Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Evaluator prompt now demands flaws be found in early iterations and only becomes lenient on the final iteration. This ensures the reflection loop actually iterates instead of passing on the first attempt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fbda2ed to
a06d256
Compare
Evaluator now explicitly instructed to find flaws and push for higher quality in iterations 1-2, ensuring multi-iteration reflection loops. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Fix evaluator session leak on CloseSession — call StopReflectionCycle
at start of CloseSessionAsync to clean up evaluator session
2. Fix PASS false positive — remove fuzzy Contains('PASS') fallback in
ParseEvaluatorResponse, treat unknown format as FAIL
3. Fix race condition — capture cycle reference in evaluator closure,
verify with ReferenceEquals before advancing to prevent wrong-cycle
corruption after user restart
4. Fix unreachable paused pill — change pattern to { IsActive: true,
IsPaused: false } so paused state is shown correctly
5. Fix resume skipHistoryMessage — add skipHistoryMessage: true to
prevent reflection follow-up from appearing as user message
Add 13 new tests covering: false positive parsing (surpass, mid-sentence),
strict PASS parsing, cycle identity verification, paused state behavior,
evaluator prompt strictness levels, and evaluator feedback tracking.
565 tests total, all passing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add screenshot_*.png, *.bmp, *.tiff to .gitignore - Add explicit warnings to copilot-instructions.md: NEVER commit screenshots/images/binaries NEVER use git add -A blindly — review staged files first Always check git status before committing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Reflection Cycle (internally called "Ralph's Loop") — an iterative, goal-driven prompt mechanism that allows Copilot sessions to autonomously work toward a stated goal across multiple turns.
What's included
Core Model (
ReflectionCycle.cs)[[REFLECTION_COMPLETE]]sentinel for structured completion detectionConsecutiveStallscounter with configurable threshold for auto-stopShouldWarnOnStallandBuildFollowUpStatushelpersService Integration (
CopilotService)StartReflectionCycle()/StopReflectionForSession()APICompleteResponsewhen cycle is active/reflect <goal>slash command supportUI (
ExpandedSessionView,ChatMessageItem)ChatMessageType.Reflectionfor compact purple badge messagesTests (31 tests)
ReflectionCycleTests.cs— sentinel detection, stall logic, iteration limits, follow-up promptsChatMessageTests.cs— reflection message type factory methodsPerformance work moved
Session switching performance optimizations have been moved to branch
perf-session-switchingfor a separate PR.