-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Severity: Medium
Category: Resource Leak / Race Condition
File: src/plan-orchestrator.ts
Description
cancel() iterates this.runningSessions and calls session.stop() to kill in-flight PTY processes. However, the per-agent catch blocks in runResearchAgent() and runPlannerAgent() call this.runningSessions.delete(session) when runPrompt() fails — removing the session from the set before stop() can be called on it.
If a session fails for reasons unrelated to cancellation (e.g., Claude CLI crashes, network error), it is ejected from runningSessions by the catch block. If cancel() is subsequently called, that session is no longer in the set and its PTY process is never stopped.
Race Sequence
1. runPrompt() throws due to Claude CLI crash
2. catch block: this.runningSessions.delete(session) ← session removed from set
PTY process still alive
3. cancel() is called by user
4. cancel() iterates runningSessions → session is gone
5. session.stop() is never called
6. PTY process remains alive as a zombie
Code
src/plan-orchestrator.ts, research agent catch block (~line 455):
} catch (err) {
this.runningSessions.delete(session); // ← removed here on non-cancel error
...
return { success: false, ... };
}src/plan-orchestrator.ts, cancel (~line 269):
for (const session of this.runningSessions) {
stopPromises.push(session.stop().catch(...)); // ← won't see the ejected session
}Impact
Under error-heavy conditions (repeated Claude API failures, plan generation retries), multiple PTY processes can accumulate without being killed. Each occupies a file descriptor and process slot. On systems with low file descriptor limits (ulimit -n), this eventually causes EMFILE errors that prevent new PTY spawns, breaking all session creation until the server is restarted.