Skip to content

Fix RPC extension UI session reconciliation for editor/newSession flows#127

Open
jorgeavaldez wants to merge 135 commits intodnouri:masterfrom
jorgeavaldez:feat/rpc/support-editor-resolve-session-changes
Open

Fix RPC extension UI session reconciliation for editor/newSession flows#127
jorgeavaldez wants to merge 135 commits intodnouri:masterfrom
jorgeavaldez:feat/rpc/support-editor-resolve-session-changes

Conversation

@jorgeavaldez
Copy link
Contributor

@jorgeavaldez jorgeavaldez commented Feb 18, 2026

Fixes RPC-mode extension UX so extension-driven flows (for example the handoff extension) behave like the pi CLI.

Handoff flow:

  • generate a prompt
  • let user edit it
  • start a new session
  • insert drafted prompt into the main input buffer

There were three gaps affecting extension flows:

  1. missing support for extension_ui_request method editor
  2. no explicit session-switch event when extensions call ctx.newSession()
  3. session reconciliation after editor responses could fail when callbacks ran outside the chat buffer (e.g. temporary editor buffer context)

What changed

  • Added full extension_ui_request editor support:

    • temporary editor replaces the existing input area (same window)
    • C-c C-c submit, C-c C-k cancel
    • restores previous buffer and cleans up temporary editor buffer on completion
  • Fixed unsupported extension UI responses to always echo the original request :id (for correct RPC correlation).

  • Added extension session reconciliation after extension-driven UI actions:

    • run get_state, compare previous/current session identity (session-file, fallback session-id)
    • if changed, reload get_messages history and update usage/header
    • otherwise refresh state/header only
  • Expanded reconciliation triggers:

    • after sending extension UI responses (confirm / select / input / editor)
    • after set_editor_text
    • agent_end fallback sync for hook/tool-driven session changes
  • Added handling for editor callbacks from non-chat buffers:

    • pi-coding-agent--extension-ui-send-response now accepts optional chat-buf
    • pi-coding-agent--extension-ui-sync-session-state now accepts optional proc/chat-buf
    • editor submit/cancel captures originating chat buffer and passes it through, so reconciliation runs against the correct session even when callback execution context is the temporary editor buffer

Validation

Added/updated unit tests for:

  • editor submit/cancel behavior
  • unsupported-method response correlation
  • session sync changed vs unchanged paths after set_editor_text
  • agent_end fallback sync
  • extension UI response paths triggering sync
  • editor submit sync when callback runs outside chat buffer
  • handoff-style flow simulation: editor submit → new session state reconcile → set_editor_text

Render test suite passes when run against the local checkout.

Disclaimer: This pull request was put together with assistance from an LLM. I reviewed and revised it to the best of my abilities, and I do believe I have a solid understanding of how it works and why this is a good fix 😄

dnouri and others added 30 commits December 31, 2025 13:05
- Handle thinking_start, thinking_delta, thinking_end events
- Use ```thinking markdown fence which becomes #+begin_src thinking
- Add pi-thinking face (italic, comment color)
- Add font-lock to apply face to thinking block content
- Simple streaming approach: just insert markers at start/end, no replacement
Add GitHub Actions workflows for automated testing:
- Unit tests on Emacs 28.2/29.4
- Integration tests with Ollama (qwen3:1.7b)
- GUI tests with real LLM streaming
- Nightly tests against pi-mono main branch

Key changes:
- Split ci.yml into focused workflows (unit/integration/gui)
- Add nightly matrix testing: 0.30.2, latest, main
- Use qwen3:1.7b (supports tool calling)
- Optimize GUI tests to 7 essential tests (4 LLM calls)
- Add debug output for CI test failures
- Add scripts/ollama.sh for local Docker testing
- Support PI_BIN override for testing local pi builds

Local usage:
  make test-gui                           # NPM version
  make test-gui PI_BIN=/path/to/cli.js    # Local build

GUI tests focus on behavior requiring real windows:
- Scroll preservation during streaming
- Auto-scroll when at buffer end
- End-to-end tool invocation
* fix: make pending requests per-process to avoid cross-session interference

Root cause: pi--pending-requests was a global hash table shared by ALL
pi sessions. When any session's process exited (via pi-quit or process
death), it cleared ALL pending request callbacks for ALL sessions.

This caused a subtle bug: quitting session A would silently break
session B's ability to receive responses. The user would send a prompt,
see their message displayed, but never receive a response because the
callback was cleared.

Changes:
- Remove global pi--pending-requests variable
- Add pi--get-pending-requests to create/retrieve per-process hash table
  stored as a process property
- Update pi--rpc-async to use per-process table
- Update pi--dispatch-response to use per-process table
- Update pi--handle-process-exit to only clear that process's requests
- Add defensive checks in pi--send-prompt with helpful error messages
  when process is unavailable

All 198 unit tests and 10 integration tests pass.

* fix: bind q to pi-quit for consistent quit behavior

Previously, q in chat buffer was bound to quit-window while C-c C-p q
used pi-quit. This caused inconsistent behavior:

- quit-window: might kill or bury buffer, but leaves input window
  orphaned (showing some other buffer)
- pi-quit: properly collects windows, kills both buffers, then
  deletes the input window

Now both q and C-c C-p q use pi-quit for clean, consistent cleanup.
Skip aborted messages when updating pi--last-usage for context
percentage display. Aborted messages may have incomplete usage
data (zeros if aborted before message_start event), which would
reset the header to show 0%.

Matches TUI footer.ts behavior: only exclude aborted, not error
(errors still consume context and have valid usage data).

Fixes dnouri#7
* build: Add package-lint to local linting checks

Adds package-lint (MELPA's linter) to catch issues before submission:
- New lint-package target in Makefile
- Integrated into check.sh pre-commit script
- Auto-installs package-lint from MELPA if not present

This catches issues like:
- Private functions that shouldn't be autoloaded
- Package-Requires in non-main files
- Various MELPA conventions

* ci: Add melpazoid workflow for MELPA style checks

Melpazoid catches additional style issues beyond package-lint:
- Top-level define-key (should be in keymap definition)
- Idiomatic patterns (bolp instead of point comparisons)
- License checks
- And more

Runs on push/PR to main branches. Warnings are informational.

* ci: Separate lint workflow from unit tests

- Unit Tests workflow now runs 'make test-unit'
- New Lint workflow runs 'make lint' (checkdoc + package-lint)
- Melpazoid workflow remains separate for style checks
- Nightly still runs full 'make check'

Added 'test-unit' make target for consistency - each workflow
now maps to exactly one make target:

  test-unit.yml        -> make test-unit
  lint.yml             -> make lint
  test-integration.yml -> make test-integration-ci
  test-gui.yml         -> make test-gui-ci
  nightly.yml          -> make check
  melpazoid.yml        -> (external)
- Remove autoload from private function pi--setup-session
- Remove Package-Requires from pi-core.el (only effective in main file)
- Use (not (bolp)) instead of point/line-beginning-position comparison
- Move define-key calls inside pi-input-mode-map defvar
- Add test for slash-capf bolp edge case

All issues reported by package-lint and melpazoid are now resolved.
2026-01-02

- Change pi--make-separator to output org heading format (* prefix)
- Update pi--post-process-org to demote headings by one level
- Content headings now become sub-headings of You/Assistant separators
- Enables org-mode folding and structure navigation
- Display HH:MM for today, YYYY-MM-DD HH:MM for older messages
- Timestamp right-aligned, eats into right dashes, label stays centered
- New pi-timestamp face (inherits shadow) for subtle styling
- Extract timestamp from message :timestamp field (ms since epoch)
- Capture current-time for live messages
Closes dnouri#5

- Show 'Compacting...' message while compaction runs
- Display compaction result with token count and summary
- Allow canceling auto-compaction with C-c C-k
- Update context usage % after compaction
2026-01-02

Add support for auto-retry events, hook errors, and streaming errors:

- Handle auto_retry_start/end events with visual retry notices
- Display hook_error events with hook name and error details
- Handle error type in message_update events
- Show "no model available" warning when API key missing
- Prevent duplicate Assistant headers during retry sequences
- Add pi-retry-notice and pi-error-notice faces
- Track retry state (is-retrying, retry-attempt, last-error)
- Clear error/retry state on agent_start/end events
- Add comprehensive tests for all new event types
When an API returns an error (e.g., 400 'Could not process image'),
the error arrives in message_end with stopReason='error' and the
error details in errorMessage. This was being silently ignored.

Now we call pi--display-error for these cases, matching the TUI
behavior in assistant-message.ts.
Previously, calling (pi "session-name") while already in a pi buffer
would ignore the session argument and reuse the current session.

Now we only reuse the current session when no session name is provided.
…ffer

Create new session when name provided from pi buffer
Simplify the rendering pipeline from:
  Agent → Markdown → [pandoc] → Org → Display
to:
  Agent → Markdown → Display

Key changes:
- Derive pi-chat-mode from gfm-mode instead of org-mode
- Replace pandoc conversion with native markdown rendering
- Replace drawer syntax (:BASH:...:END:) with overlay-based tool blocks
- Tool blocks use background colors matching CLI (pending/success/error)
- Align markdown tables after streaming completes
- Update thinking block regexp for markdown fence syntax

Dependencies:
- Add: markdown-mode (pure Elisp, from MELPA)
- Remove: pandoc (external binary)

Net result: -16 lines of code, simpler architecture, no external tools.
pi-gui-test-streaming-p was only checking for 'streaming' status,
but after pi-send the status is 'sending' (waiting for response).
This caused a race condition where tests would complete before
the LLM response arrived.
Small LLMs like qwen3:1.7b sometimes skip tool calls unless
explicitly instructed. Changed prompt from 'Read the file' to
'Use the read tool to show the contents of' for reliability.
Migrate from org-mode to markdown-mode

Simplify the rendering pipeline from:
  Agent → Markdown → [pandoc] → Org → Display
to:
  Agent → Markdown → Display

Key changes:
- Derive pi-chat-mode from gfm-mode instead of org-mode
- Replace pandoc conversion with native markdown rendering
- Replace drawer syntax (:BASH:...:END:) with overlay-based tool blocks
- Tool blocks use background colors matching CLI (pending/success/error)
- Align markdown tables after streaming completes
- Update thinking block regexp for markdown fence syntax

Dependencies:
- Add: markdown-mode (pure Elisp, from MELPA)
- Remove: pandoc (external binary)
- test-gui.yml: 15 minute timeout
- test-integration.yml: 15 minute timeout
- nightly.yml: 10 min for unit, 15 min for integration/GUI

Prevents hung tests from consuming 6 hours of CI time.
Normal runs complete in 3-5 minutes.
Small LLMs sometimes skip tool calls with ambiguous prompts.
Changed 'Use the read tool to show...' to 'Call the read tool
on X and show me its contents...' which makes the tool call
requirement explicit.
…le (dnouri#21)

Previously, session activity state was tracked in two places:
- pi--status (buffer-local symbol: idle/sending/streaming/compacting)
- :is-streaming in pi--state plist (boolean)

This duplication was error-prone and confusing. Now:
- pi--status is the single source of truth, defined in pi-core.el
- pi--update-state-from-event sets status on agent_start/agent_end
- pi--extract-state-from-response returns :status for initialization
- :is-streaming and :is-compacting removed from pi--state plist

Benefits:
- Clearer code: one variable to check, not two
- Status set atomically in event handler, not scattered in display functions
- Tests are simpler and more focused
* refactor: Rename files from pi-*.el to pi-coding-agent-*.el

Per MELPA maintainer request, renaming the package from 'pi' to
'pi-coding-agent' to avoid namespace collision.

* refactor: Update provide/require to pi-coding-agent

- (provide 'pi) → (provide 'pi-coding-agent)
- (provide 'pi-core) → (provide 'pi-coding-agent-core)
- All test file provide/require statements updated similarly

* refactor: Update package headers and footers

- ;;; pi.el → ;;; pi-coding-agent.el
- ;;; pi-core.el → ;;; pi-coding-agent-core.el
- All test file headers similarly updated
- Docstrings updated to reference pi-coding-agent

* refactor: Update defgroup and :group references to pi-coding-agent

- (defgroup pi) → (defgroup pi-coding-agent)
- :group 'pi → :group 'pi-coding-agent (27 occurrences)
- Kept :prefix "pi-" for ergonomic function naming

* refactor: Update internal documentation strings

- "Version of pi.el" → "Version of pi-coding-agent"
- "Core functionality for pi.el" → "Core functionality for pi-coding-agent"

* refactor: Update test file docstrings

- "core pi.el functionality" → "core pi-coding-agent functionality"
- "pi.el test files" → "pi-coding-agent test files"
- "tests for pi.el" → "tests for pi-coding-agent"

* refactor: Update test data paths

Update sample directory path in session-dir-name test

* refactor: Update Makefile file references

- Update byte-compile to use new filenames
- Update test loading to use new feature names
- Update lint commands for new filenames
- PI_BIN and CLI references unchanged (refers to external tool)

* refactor: Rename all pi- symbols to pi-coding-agent-

Full MELPA compliance: all public and private symbols now use
the pi-coding-agent- prefix to match the package name.

- Renamed ~200 functions, variables, faces, and constants
- Main entry point is now M-x pi-coding-agent
- Mode names: pi-coding-agent-chat-mode, pi-coding-agent-input-mode
- Face names: pi-coding-agent-separator, etc.
- Customization: pi-coding-agent-rpc-timeout, etc.
- Fixed docstrings that exceeded 80 chars due to longer names
- Updated all test calls from (pi) to (pi-coding-agent)

The short 'pi' alias was removed as package-lint doesn't allow
aliases with non-conforming prefixes. Users can add their own:
  (defalias 'pi 'pi-coding-agent)

* fix: Update test scripts for renamed functions

- Update run-gui-tests.sh to use pi-coding-agent-gui-test- prefix
- Fix (pi) call in gui-test-utils to (pi-coding-agent)

* docs: Update README for pi-coding-agent package name

- Update title and all package references
- Update installation instructions (package-install, use-package)
- Update configuration variable names
- Update M-x command from 'pi' to 'pi-coding-agent'
- Add tip for creating short alias
- Update all GitHub and MELPA URLs to new repo name

* ci: Update MELPA recipe in melpazoid workflow

Update package name from 'pi' to 'pi-coding-agent' and repo from
'dnouri/pi.el' to 'dnouri/pi-coding-agent'

* docs: Add 'pi' alias to all use-package examples

Users who copy the example config will get M-x pi automatically.
Updated Usage section to reference M-x pi since users have the alias.

* fix: Quote gfm-mode symbol in docstring

Checkdoc requires Lisp symbols to be quoted in docstrings.
…dnouri#34)

Root cause: Overlay created with rear-advance=t for streaming, but
move-overlay doesn't clear the rear-advance property. The overlay
kept extending whenever text was inserted at or after its end.

Fix: Replace overlay with a new one (rear-advance=nil) when finalizing.
This affects both normal tool completion and abort scenarios.

Changes:
- Add pi-coding-agent--tool-overlay-finalize helper that replaces the
  overlay with a non-rear-advance version
- Use finalize helper in display-tool-end (success/error cases)
- Use finalize helper in display-agent-end (abort case)
- Update toggle-tool-output to adjust overlay bounds after content change

Closes dnouri#31
* Update for pi-coding-agent 0.37 API changes

- reset -> new_session command
- hook_error -> extension_error event
- hookPath -> extensionPath field
- queuedMessageCount -> pendingMessageCount
- Update CI to test against 0.37.5

* Add integration test for new_session command

Catches API breaking changes like reset -> new_session

* Update branch API: entryIndex -> entryId

The branch command now uses entryId (UUID) instead of entryIndex (number).
Format function now takes optional index parameter for display.

* Improve integration tests for API compatibility

- Simplify new_session test to not require LLM response
- Add get_branch_messages test to verify entryId field
- Both tests catch API breaking changes without slow LLM calls

* Auto-reinstall pi when cached version differs from requested

Prevents stale cache from causing test failures after version upgrades.
Skips check for 'latest' since we can't compare version strings.
After the rename to pi-coding-agent, the check.sh script still
referenced the old file names (pi.el, pi-core.el, etc.).

Also added package initialization for markdown-mode dependency,
matching the Makefile's approach.
The async callback in pi-coding-agent-new-session relied on
current buffer context to find the chat buffer, but async
callbacks run in arbitrary buffer contexts.

This caused pi-coding-agent--clear-chat-buffer to silently do
nothing when the callback was executed in a non-pi buffer.

Fix: Capture chat-buf before the async call and use
with-current-buffer in the callback, consistent with other
commands like pi-coding-agent-cycle-thinking.

Also add buffer-live-p check to handle case where buffer is
killed while waiting for RPC response.
* fix: Handle empty vector and nil text in branch selector

Two issues fixed:

1. Empty vector detection: JSON arrays are parsed as vectors,
   and (null []) is nil (false). Changed to use seq-empty-p
   to properly detect empty messages list.

2. Nil text handling: Added (or text "") to handle case where
   message text might be nil, preventing wrong-type-argument error
   from truncate-string-to-width.

Added tests for both edge cases.

* fix: Async callback context issues in branch and resume

Multiple functions used --get-chat-buffer and --get-input-buffer
inside async callbacks, but callbacks run in arbitrary buffer
contexts where these functions return nil.

Changes:
- --load-session-history: Accept optional chat-buf parameter
- --display-session-history: Accept optional chat-buf parameter
- --show-branch-selector: Capture chat-buf and input-buf before
  async call, pass to --load-session-history
- pi-coding-agent-resume-session: Capture chat-buf before async call

Also includes previous fixes:
- Empty vector detection: Use seq-empty-p instead of null
- Nil text handling: Default to empty string in format-branch-message

Added test for load-session-history buffer parameter.

* refactor: Address code review feedback

- Consistent naming: Use chat-buf instead of target-buf in
  --load-session-history (matches parameter name)
- Defensive coding: Add buffer-live-p check in --display-session-history
- Documentation: Add note about async callback usage, fix docstring
  (MESSAGES is a vector, not a list)
- Clarity: Add comment explaining why text nil check is needed
  (RPC may return null)
… (dnouri#39)

Add font-lock-ensure call after toggling tool output. JIT font-lock
only fontifies visible portions lazily, so expanding collapsed content
left it unhighlighted until the user scrolled around.

The fix follows the established pattern from --finalize-streaming-message
and --render-history-text which both call font-lock-ensure after
inserting content.
…uri#24) (dnouri#40)

The --spinner-stop function now accepts an optional buffer parameter.
When called from async callbacks (which may run in arbitrary buffer
context like process sentinel), the buffer must be passed explicitly.

Without this fix, if the process died during manual compaction:
1. Callback runs in sentinel context
2. --spinner-stop calls --get-chat-buffer which returns nil
3. (delq nil spinning-sessions) doesn't remove actual buffer
4. Spinner continues spinning forever

This follows the same pattern used to fix async callback context
issues in branch, resume, and new-session.
dnouri and others added 28 commits February 8, 2026 01:57
* fix: Pin all CI workflows to minimum supported pi version (0.51.3)

- test-gui.yml: 0.50.1 → 0.51.3
- test-integration.yml: 0.50.4 → 0.51.3
- nightly.yml: 0.50.1 → 0.51.3 in version matrix
- README.org: update nightly description (0.48.0 → 0.51.3)
- Remove dead get_commands skip guards (0.50.1 compat no longer needed)
- Makefile: set load-prefer-newer to avoid stale .elc in make test

* refactor: Centralize PI_VERSION — Makefile is single source of truth

Workflows now extract PI_VERSION from Makefile instead of hardcoding it.
Updating the pinned version only requires changing one line in the Makefile.

Nightly simplified to test just two versions: pinned + latest.
Removed pi-mono main branch builds (noise — latest covers forward compat).

* fix: Session-name integration test race condition

The test called accept-process-output once for 1 second, which returns
as soon as ANY output arrives.  If residual output from the prompt phase
was buffered, it returned immediately — before the set_session_name RPC
response arrived.  The file was then read before the write happened.

Fix: poll for the RPC callback flag with a proper timeout loop, matching
the pattern used by all other integration tests.

* fix: Sync pi-coding-agent-core.el version header to 1.2.6

Was stuck at 0.1.0 while the main file had 1.2.6. Both files are
part of the same package and should have matching versions.
…ri#109)

- Change stats key from S to i to avoid conflict with Skills S submenu
- Switch submenu run keys from digits (1-9) to letters (a-z) to support
  more than 9 commands without 'Key sequence 1 0' crash
- Switch submenu edit keys from shift-numbers (!@#) to uppercase letters
  (A-Z) so a/A always refer to the same command
- Extract shared ordering helper (submenu-commands-ordered) so run and
  edit key positions always correspond
- Cap both key sets at 26 entries
Split the 4,300-line monolith into a strict dependency chain:

  pi-coding-agent.el  (entry point, 165 lines)
    ├── menu.el       (transient menu, session management)
    ├── input.el      (history, completion, send/abort, queuing)
    └── render.el     (chat rendering, tool output, fontification)
          └── ui.el   (shared state, faces, modes, display primitives)
                └── core.el  (JSON/RPC protocol — unchanged)

Test files mirror the source structure (6 test files, 454 tests).
Per-module make targets for fast feedback during development.
Purely structural — no behavioral changes.
…ates (dnouri#113)

Fontify buffers (" *pi-fontify:<lang>*") were globally shared by
language. When two sessions wrote files in the same language, they
corrupted each other's buffer — producing garbled syntax highlighting
and triggering full O(n) re-fontification on every token instead of
incremental O(1) appends. This caused multi-second freezes during
write tool streaming in multi-session workflows.

Changes:

- Each chat buffer now tracks its own fontify buffers via a buffer-local
  hash table (pi-coding-agent--fontify-buffers). Lookup is by hash
  table, not by buffer name. Buffers are killed on session end.

- Skip the delete+reinsert cycle in display-tool-streaming-text when
  the visible tail content is unchanged. Most LLM tokens extend a
  partial line not shown in the rolling preview, making ~90% of deltas
  no-ops for the buffer mutation path. fontify-sync still runs on every
  delta to keep the fontify buffer current.

- Remove handle-message-update, which accumulated text deltas via
  O(n²) concat into a field never read during streaming.

- Surface fontify-sync errors via message instead of swallowing
  silently with (condition-case nil ... (error nil)).

Benchmarks (Emacs 30.1, 500-delta write tool, xvfb real windows):

  Two-session victim (mean):  36,145µs → 1,537µs  (23.5x faster)
  Two-session victim (p95):   84,656µs → 2,753µs  (30.7x faster)
  Single-session write (p50):    641µs →   594µs   (stable)
  Bash streaming (mean):          79µs →    28µs   (2.8x faster)

  Fontify buffer corruption:  246 shrink events → 0
  Fontify buffer leaks:       confirmed → fixed

Realistic partial-token benchmark (1,623 tokens, 171 lines of Python):

  Display updates: 171 out of 1,623 deltas (89.5% skip rate)
  Mean per-delta: 1,063µs  (includes 89.5% near-free no-ops)
* fix: reduce test model maxTokens from 8192 to 500

qwen3:1.7b generates ~100-200 thinking tokens per response that cannot
be disabled via Ollama's OpenAI-compatible API. With maxTokens 8192,
the model can ramble for 600+ tokens (21s locally, ~105s on CI).

Reducing to 500 caps worst-case output while still allowing:
- Tool calls (~160 tokens including thinking overhead)
- Short text responses (~300 tokens after thinking)

This eliminates the long-tail variance that pushes CI runs past timeouts.

* fix: increase test timeouts for CI reliability

Integration: 180s → 300s (steer test needs 2 LLM turns, ~170s on CI)
GUI: 90s → 180s (tool calls take 50-65s on CI, 90s gave only 38% margin)

Added docstrings explaining the rationale for each value.

* fix: add Ollama model warmup to all CI workflows

First inference after model pull pays a 10-30s cold start penalty for
loading weights into memory.  Add a minimal warmup call (think:false,
num_predict:1) after pull to absorb this cost during setup rather than
during the first test.

Applied to: nightly, test-integration, test-gui workflows.

* fix: reduce test model maxTokens from 500 to 300

500 tokens was still too much for the steer test (needs 2 LLM turns).
On a loaded CI runner, one turn took 104s at maxTokens=500.  Two turns
exceeded the 300s timeout.

At maxTokens=300:
- Tool calls still work (use 123-155 tokens including thinking overhead)
- Text responses complete in ~9s locally (vs ~17s at 500)
- Two turns estimated: 90s on CI (5x), 124s under heavy load (10x)
- Both well within the 300s timeout

Text content is empty at 300 tokens (thinking consumes the budget),
but integration tests only check events and RPC mechanics, not content.

* fix: improve GUI test reliability — drop /no_think, poll for content

Two root causes identified via spike testing (240+ trials):

1. /no_think prefix hurts tool calling: Without it, 'Read the file'
   achieves 85% tool call rate (vs 75% with /no_think).  The prefix
   is just noise that wastes the model's thinking budget.

2. ~2% tool re-call rate: The model occasionally calls read again
   after getting results (following the tool description's 'continue
   with offset' instruction).  This loops indefinitely in the agent.

Fixes:
- Drop /no_think from all GUI test prompts
- content-tool-output-shown: poll for expected content instead of
  waiting for idle, making the test immune to re-call loops
- scroll-preserved-tool-use: same pattern — wait for tool output
  before checking scroll position
- Bump maxTokens 300→400 for better tool call success rate while
  keeping steer test timing reasonable (~200s estimated for 2 turns)

* fix: tool-overlay-bounded — poll for content, avoid idle race

The test failed in 0.007s because of a race condition: the user
message is displayed (satisfying wait-for-send-start), but the model
hasn't started streaming yet, so wait-for-idle returns immediately.
The test then checks for 'BEFORE' which isn't in the chat yet.

Fix: send with no-wait and poll for expected content, same pattern
as content-tool-output-shown.  This also makes the test immune to
the ~2% tool re-call loop.

* fix: tool-overlay-bounded — split into two simple prompts

The combined prompt ('Read file and say ENDMARKER') has ~15-30%
tool call failure rate because it's too complex for qwen3:1.7b.

Split into two prompts:
1. 'Read the file PATH' — simple, ~85% tool call rate
2. 'Say ENDMARKER' — no tool needed, text only

The test still verifies the overlay boundary: ENDMARKER text (from
the second turn) must not be inside the tool block overlay (from
the first turn).  The overlay is per-tool-block, so text from any
subsequent turn exercises the same boundary logic.

* fix: set maxTokens=500 + integration timeout 600s

CI runs revealed two competing constraints:
- Tool calling needs maxTokens≥500 (at 300-400, ~15-30% no-tool-call rate)
- Steer test needs low maxTokens (2 turns × 150s/turn at maxTokens=500)

Resolution: use maxTokens=500 for reliable tool calls (run 1 had
13/13 GUI pass at 500), and increase integration timeout to 600s to
accommodate the steer test.  Workflow job timeout bumped to 20min
for integration jobs (nightly + PR/push).

* fix: tool-overlay-bounded — use fresh session to avoid race

Tests 7-11 share a session created after extension tests killed the
original.  This new session has a race condition where wait-for-idle
returns before the model starts streaming, causing 0.007s completions.

Use with-fresh-session for the overlay test to guarantee a properly
initialized session, matching how content-tool-output-shown succeeds
(it uses the original session from test 1, which works reliably).
* refactor: remove npm cache from CI workflows

The cache saved ~5-10s per job but caused a critical bug: the
'latest' cache key (deps-Linux-pi-latest) never busted, so nightly
pi@latest tests were running pi 0.30.2 instead of the actual latest
(0.52.9).  npm install takes ~7-15s — not worth the complexity.

* feat: bump pinned pi version 0.51.3 → 0.52.9

All RPC commands we use (get_commands, set_session_name, etc.) work
correctly with 0.52.9.  The 0.52.x series adds image support for
steer/follow_up and auto-compaction resume — backward-compatible
changes that don't affect our test suite.

* trigger CI
…up (dnouri#116)

Copying from the chat buffer preserves raw markdown by default —
useful for pasting into docs, Slack, or any markdown-aware context.

For plain text without **, backticks, or code fences:
- Defcustom pi-coding-agent-copy-visible-text: set to t to make all
  copy commands (M-w, C-w) strip hidden markup automatically
- Command pi-coding-agent-copy-visible: explicit one-off, bindable

Uses filter-buffer-substring-function to strip invisible and
display-empty text. Delegates to buffer-substring--filter when disabled.
Derive pi-coding-agent-input-mode from gfm-mode instead of text-mode.
This gives the prompt buffer markdown highlighting (bold, italic, code
spans, fenced blocks, headings) while keeping markup visible.

All pi keybindings (C-c C-c, C-c C-k, TAB, M-p, M-n, etc.) override
their gfm-mode counterparts via the derived mode keymap.
The pi CLI sends ANSI color codes in extension_ui_request setStatus
messages (e.g. plan-mode sends \e[38;5;226m⏸ plan\e[39m for yellow
text). Terminal UIs handle these natively, but Emacs header-lines
render them literally, producing garbled output like:

  │ [38;5;226m⏸ plan[39m

Apply ansi-color-filter-apply in --extension-ui-set-status to strip
escape sequences at the ingestion boundary. The built-in ansi-color
library is already required by render.el.

Add test for ANSI stripping alongside existing setStatus tests.
Strip ANSI escape codes from extension status text
…ut (dnouri#119)

Register a jit-lock function that runs after font-lock and strips
markdown text properties (display, invisible, font-lock-multiline,
face) from the pending tool overlay, restoring intended faces for
both header and content regions.
* test: add thinking whitespace normalization coverage

* fix: normalize thinking block whitespace while streaming

* fix: harden thinking streaming normalization and state reset
Press f on any user turn in the chat buffer to fork the conversation
from that point.  Uses the get_tree RPC to map visible buffer position
to entry ID, handling compaction (compacted-away turns aren't rendered,
so the last-N visible headings are aligned to the active branch).

Also fixes chat navigation (n/p): the old pattern searched for the
never-present "You:" instead of setext headings, and now recenters
the heading to the top of the window.

Changes:
- Setext heading detection: regex + underline predicate, shared across
  navigation and fork-at-point
- Turn detection: collect headings, map point to 0-based turn index
- Tree helpers: flatten-tree (hash index) and active-branch-user-ids
  (leaf-to-root walk)
- execute-fork: extracted shared fork logic (RPC + state refresh +
  history reload + input prefill), used by both fork-at-point and
  the menu fork selector
- resolve-fork-entry: tree-to-entryId mapping, independently testable
- build-tree test helper: flat specs replace deeply nested tree
  construction in tests

49 new tests covering heading detection, navigation, turn indexing,
tree walk scenarios, and fork-at-point integration.
* feat: DWIM toggle for pi-coding-agent entry point

Add agent-shell-style DWIM behavior to the main `pi-coding-agent'
command and a new `pi-coding-agent-toggle' command for toggling
session window visibility.

DWIM behavior for M-x pi-coding-agent (no prefix arg):
- From a pi buffer: toggle visibility (hide windows)
- From a non-pi buffer with existing project session: reuse it
- No session for project: create a new one
- C-u prefix or Lisp string arg: named session (unchanged)

New command `pi-coding-agent-toggle':
- Pi windows visible -> hide them
- Pi windows hidden but session exists -> show them
- No session -> user-error

New functions in pi-coding-agent-ui.el:
- `pi-coding-agent-project-buffers': find all pi chat buffers for
  the current project directory, ordered by buffer-list recency
- `pi-coding-agent--hide-buffers': hide pi windows using a hybrid
  strategy: delete-window when the frame has other windows,
  bury-buffer for sole-window frames. Second pass handles the case
  where bury-buffer lands on the paired pi buffer.

Bug fix in test-common.el:
- Mock session cleanup called nonexistent functions
  `pi-coding-agent--chat-buffer-name' and
  `pi-coding-agent--input-buffer-name' (swallowed by ignore-errors,
  causing buffer leaks). Fixed to use `pi-coding-agent--buffer-name'
  with :chat/:input type argument.

Tests: 5 new tests (502 total, all passing):
- dwim-reuses-existing-session
- new-session-with-prefix-arg
- project-buffers-finds-session
- project-buffers-excludes-other-projects
- toggle-no-session-errors

* refactor: simplify DWIM hide logic, input-first window deletion

- Rename --hide-buffers → --hide-session-windows for clarity
- Process input windows before chat windows: input windows are
  always child splits so delete-window succeeds cleanly, then
  chat windows are handled. Eliminates the second-pass hack
  that was needed when chat was processed first.
- Bind project-buffers result once instead of calling twice
- Move side effects out of cond conditions for readability

* fix: preserve window layout when hiding session

Don't delete-window the chat window — that collapses vertical
splits.  Instead, follow quit's pattern: delete input windows
(child splits collapse into parent), then bury the chat buffer
in its window (switches to previous buffer, window stays).

Before: [code | pi-chat] → hide → [code] (split gone)
After:  [code | pi-chat] → hide → [code | prev-buf] (split kept)

* fix: review cleanup — docs, toggle deps, test hygiene

Address issues from code review:

- Remove 'how we got there' paragraph from --hide-session-windows
  docstring (comments should explain what/why, not history)
- Remove redundant negative comment ('don't delete the window')
- Add --check-dependencies to pi-coding-agent-toggle (was missing)
- Add toggle to Usage header in Commentary
- Rewrite prefix-arg test: no longer nests inside with-mock-session
  macro unnecessarily; proper unwind-protect structure
- Add test for DWIM reuse of named sessions (explicit coverage for
  the behavior change where project-buffers finds named sessions)

---------

Co-authored-by: Conor Nash <conor@nbs.consulting>
Co-authored-by: Daniel Nouri <daniel.nouri@gmail.com>
@jorgeavaldez jorgeavaldez force-pushed the feat/rpc/support-editor-resolve-session-changes branch from 55001f1 to 7a2cf41 Compare February 18, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants