Skip to content

fix: address review findings from #177 (state-persistence)#195

Merged
mafredri merged 5 commits into35C4n0r/agentapi-state-persistencefrom
mux/review-fixes-for-177
Feb 26, 2026
Merged

fix: address review findings from #177 (state-persistence)#195
mafredri merged 5 commits into35C4n0r/agentapi-state-persistencefrom
mux/review-fixes-for-177

Conversation

@mafredri
Copy link
Member

🤖 Generated by Mux

Code review of #177 identified several correctness, safety, and cross-platform issues. This PR fixes them.

Issues found and fixed

P1 - Message corruption after state restore (lib/screentracker/pty_conversation.go)

updateLastAgentMessageLocked assumed the last saved message was always an agent message. If a session was saved right after the user sent a message, restoring would overwrite the new agent message with the user's content. Added a role check to the guard condition.

P2 - Dropped initial prompt on crash recovery (lib/screentracker/pty_conversation.go)

When restarting with the same initial prompt that was never sent in the crashed session (InitialPromptSent: false), the code set initialPromptSent = !isDifferent which unconditionally marked it as sent. Now only overrides the saved status when the prompt is actually different.

P2 - EmitError called under conversation lock (lib/screentracker/pty_conversation.go)

All other emit calls (EmitStatus, EmitMessages, EmitScreen) happen after releasing c.lock, but EmitError was called while holding it. Moved the emit outside the lock to match the established pattern.

P2 - isProcessRunning broken on Windows (cmd/server/server.go)

Used syscall.Signal(0) without a build tag. On Windows, Signal(0) returns "not supported", so the function always returned false. Split into process_unix.go and process_windows.go with proper build tags. Also fixed a flaky test that hardcoded PID 12345 (could match a running process on CI).

P2 - Unbounded errors slice in EventEmitter (lib/httpapi/events.go)

EmitError appended to an unbounded slice. For long-running servers, this is a memory leak. Capped at 100 entries, keeping the most recent.

P3 - Missing f.Sync() before atomic rename (lib/screentracker/pty_conversation.go)

The atomic write pattern (temp file, then rename) did not flush data before closing. On power failure, the rename could be committed but file data not flushed, leaving a corrupt state file.

P3 - Duplicate SIGINT registration (cmd/server/signals_unix.go)

os.Interrupt and syscall.SIGINT are identical on Unix. Removed the duplicate.

P3 - cleanupPIDFile race (cmd/server/server.go)

If a new instance started before the old one's deferred cleanup ran, the new instance's PID file got deleted. Now reads the file and only removes it if the PID matches the current process.

Validation

  • go test ./lib/screentracker/... ./lib/httpapi/... ./cmd/server/... - all pass
  • go vet on all packages - clean
  • GOOS=windows go vet ./cmd/server/... - clean

Refs #177

…romptSent logic

Fix three bugs in pty_conversation.go:

1. updateLastAgentMessageLocked: add role check before overwriting
   agentMessage with the last loaded message. Previously, when state
   was loaded with a user message as the last entry, the user content
   was used as the agent message, creating a corrupted message.

2. Start() snapshot loop: move EmitError call outside c.lock to match
   all other emit calls (EmitStatus, EmitMessages, EmitScreen). This
   avoids longer lock hold times and a latent deadlock path.

3. loadStateLocked: when the current initial prompt matches the saved
   prompt, preserve agentState.InitialPromptSent instead of
   unconditionally setting initialPromptSent to true. This ensures
   that if a previous session crashed before sending the prompt, the
   prompt is retried on reload.
…ID test

isProcessRunning uses syscall.Signal(0) and syscall.EPERM which are not
supported on Windows, causing the function to always return false. Move
it to process_unix.go (with //go:build unix) and add a stub in
process_windows.go that returns false with a documenting comment.

The "overwrites existing file" test wrote PID 12345, which could match a
live process and cause writePIDFile to error. Replace with a non-numeric
string so strconv.Atoi fails and the liveness check is skipped. Add a
dedicated "detects running process" test using os.Getpid() instead.
…ded growth

Previously EventEmitter.errors grew without bound, leaking memory on
long-running servers and sending all historical errors to new SSE
subscribers.

Cap the slice to the most recent 100 errors (maxStoredErrors) so late
subscribers still receive useful context without unbounded memory use.
…T, verify PID ownership

Add f.Sync() before close in SaveState() so data is flushed to disk
before the atomic rename, preventing corrupt state files on power
failure.

Remove duplicate syscall.SIGINT from signal.Notify in signals_unix.go
since os.Interrupt is identical to syscall.SIGINT on Unix.

Make cleanupPIDFile read the PID file and verify it contains the
current process PID before deleting, preventing a new instance's PID
file from being removed by an old instance's deferred cleanup.
@github-actions
Copy link

✅ Preview binaries are ready!

To test with modules: agentapi_version = "agentapi_195" or download from: https://github.com/coder/agentapi/releases/tag/agentapi_195

@mafredri mafredri changed the title fix(state-persistence): address review findings from #177 fix: address review findings from #177 (state-persistence) Feb 26, 2026
@mafredri mafredri requested a review from 35C4n0r February 26, 2026 13:01
…fety

The bare channel send under lock in the snapshot loop relies on
statusLocked blocking Send until the snapshot buffer fills. Document
this invariant so future refactors do not break the assumption.
Copy link
Collaborator

@35C4n0r 35C4n0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @mafredri

@mafredri mafredri merged commit b2cbf56 into 35C4n0r/agentapi-state-persistence Feb 26, 2026
3 checks passed
@mafredri mafredri deleted the mux/review-fixes-for-177 branch February 26, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants