fix: address review findings from #177 (state-persistence)#195
Merged
mafredri merged 5 commits into35C4n0r/agentapi-state-persistencefrom Feb 26, 2026
Merged
fix: address review findings from #177 (state-persistence)#195mafredri merged 5 commits into35C4n0r/agentapi-state-persistencefrom
mafredri merged 5 commits into35C4n0r/agentapi-state-persistencefrom
Conversation
…romptSent logic Fix three bugs in pty_conversation.go: 1. updateLastAgentMessageLocked: add role check before overwriting agentMessage with the last loaded message. Previously, when state was loaded with a user message as the last entry, the user content was used as the agent message, creating a corrupted message. 2. Start() snapshot loop: move EmitError call outside c.lock to match all other emit calls (EmitStatus, EmitMessages, EmitScreen). This avoids longer lock hold times and a latent deadlock path. 3. loadStateLocked: when the current initial prompt matches the saved prompt, preserve agentState.InitialPromptSent instead of unconditionally setting initialPromptSent to true. This ensures that if a previous session crashed before sending the prompt, the prompt is retried on reload.
…ID test isProcessRunning uses syscall.Signal(0) and syscall.EPERM which are not supported on Windows, causing the function to always return false. Move it to process_unix.go (with //go:build unix) and add a stub in process_windows.go that returns false with a documenting comment. The "overwrites existing file" test wrote PID 12345, which could match a live process and cause writePIDFile to error. Replace with a non-numeric string so strconv.Atoi fails and the liveness check is skipped. Add a dedicated "detects running process" test using os.Getpid() instead.
…ded growth Previously EventEmitter.errors grew without bound, leaking memory on long-running servers and sending all historical errors to new SSE subscribers. Cap the slice to the most recent 100 errors (maxStoredErrors) so late subscribers still receive useful context without unbounded memory use.
…T, verify PID ownership Add f.Sync() before close in SaveState() so data is flushed to disk before the atomic rename, preventing corrupt state files on power failure. Remove duplicate syscall.SIGINT from signal.Notify in signals_unix.go since os.Interrupt is identical to syscall.SIGINT on Unix. Make cleanupPIDFile read the PID file and verify it contains the current process PID before deleting, preventing a new instance's PID file from being removed by an old instance's deferred cleanup.
|
✅ Preview binaries are ready! To test with modules: |
…fety The bare channel send under lock in the snapshot loop relies on statusLocked blocking Send until the snapshot buffer fills. Document this invariant so future refactors do not break the assumption.
35C4n0r
approved these changes
Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Code review of #177 identified several correctness, safety, and cross-platform issues. This PR fixes them.
Issues found and fixed
P1 - Message corruption after state restore (
lib/screentracker/pty_conversation.go)updateLastAgentMessageLockedassumed the last saved message was always an agent message. If a session was saved right after the user sent a message, restoring would overwrite the new agent message with the user's content. Added a role check to the guard condition.P2 - Dropped initial prompt on crash recovery (
lib/screentracker/pty_conversation.go)When restarting with the same initial prompt that was never sent in the crashed session (
InitialPromptSent: false), the code setinitialPromptSent = !isDifferentwhich unconditionally marked it as sent. Now only overrides the saved status when the prompt is actually different.P2 -
EmitErrorcalled under conversation lock (lib/screentracker/pty_conversation.go)All other emit calls (EmitStatus, EmitMessages, EmitScreen) happen after releasing
c.lock, butEmitErrorwas called while holding it. Moved the emit outside the lock to match the established pattern.P2 -
isProcessRunningbroken on Windows (cmd/server/server.go)Used
syscall.Signal(0)without a build tag. On Windows,Signal(0)returns "not supported", so the function always returnedfalse. Split intoprocess_unix.goandprocess_windows.gowith proper build tags. Also fixed a flaky test that hardcoded PID 12345 (could match a running process on CI).P2 - Unbounded
errorsslice in EventEmitter (lib/httpapi/events.go)EmitErrorappended to an unbounded slice. For long-running servers, this is a memory leak. Capped at 100 entries, keeping the most recent.P3 - Missing
f.Sync()before atomic rename (lib/screentracker/pty_conversation.go)The atomic write pattern (temp file, then rename) did not flush data before closing. On power failure, the rename could be committed but file data not flushed, leaving a corrupt state file.
P3 - Duplicate SIGINT registration (
cmd/server/signals_unix.go)os.Interruptandsyscall.SIGINTare identical on Unix. Removed the duplicate.P3 -
cleanupPIDFilerace (cmd/server/server.go)If a new instance started before the old one's deferred cleanup ran, the new instance's PID file got deleted. Now reads the file and only removes it if the PID matches the current process.
Validation
go test ./lib/screentracker/... ./lib/httpapi/... ./cmd/server/...- all passgo veton all packages - cleanGOOS=windows go vet ./cmd/server/...- cleanRefs #177