Skip to content

Conversation

@phuongnd08
Copy link
Collaborator

Summary

Investigated PTY file descriptor leak where stats dialog reports "Potential Leaked PTYs" (e.g., "2 masters - 1 sessions").

Root Cause

Through comprehensive testing (see added integration tests), I discovered the leak originates in node-pty itself, not our application code:

  • node-pty opens 2 /dev/ptmx file descriptors per PTY session on macOS
  • After calling kill(), only 1 FD is released
  • Result: 1 FD leaked per terminated PTY session

Fixes Applied

Despite the underlying node-pty limitation, this PR makes critical improvements to prevent additional leaks from our code:

1. Always call ptyProcess.kill()

File: packages/core/src/utils/shell.ts:161-164

Even after sending SIGKILL to the process group, we must call ptyProcess.kill() to signal node-pty to release resources:

// CRITICAL: Always call ptyProcess.kill() to close file descriptors
// Even after killing the process group, we must call this to release PTY FDs
ptyProcess.kill('SIGKILL');

2. Properly dispose exit listener

File: packages/core/src/services/ShellSessionManager.ts

We now properly store and dispose the exit listener disposable to prevent the PTY object from being retained in memory:

// Store the exit disposable
session.exitDisposable = onPtyExit(ptyProcess, (exitCode) => {
  // ...
});

// Later, in terminateSession:
if (session.exitDisposable) {
  session.exitDisposable.dispose();
  session.exitDisposable = undefined;
}

Testing

Added two comprehensive integration tests:

  1. pty-leak-integration.test.ts: Tests with our ShellSessionManager

    • Creates 5 sessions, terminates all
    • Result: 5 FDs leaked (expected due to node-pty)
  2. node-pty-direct-leak.test.ts: Tests node-pty directly without our code

    • Creates 5 PTYs with raw node-pty, kills all
    • Result: 5 FDs leaked (proves leak is in node-pty)
    • Confirms: Our code is not the source of the leak

Documentation

Created PTY_LEAK_INVESTIGATION.md with:

  • Full analysis of the leak
  • Test results and reproduction steps
  • Short-term and long-term recommendations
  • Explanation of stats dialog calculation

Impact

  • ✅ Memory cleanup improved (no PTY object retention)
  • ✅ Stats dialog accurately reports the leak
  • ✅ Application remains stable with proper monitoring
  • ⚠️ Remaining 1 FD/session leak is an accepted node-pty limitation

Recommendations

Short Term

  1. Accept the limitation: ~1 FD per terminated PTY will remain open
  2. Monitor FD usage: Stats dialog tracks this
  3. Increase FD limit if needed: ulimit -n 4096 for heavy terminal users

Long Term

  1. Investigate node-pty alternatives
  2. Report to node-pty project with our reproduction case
  3. Consider worker process isolation

Test plan

  • Run integration tests to verify leak is documented
  • Manually test terminal creation/destruction in app
  • Verify stats dialog shows accurate PTY master FD counts
  • Confirm memory is properly released despite FD leak

🤖 Generated with Claude Code

phuongnd08 and others added 5 commits November 14, 2025 11:38
## Problem
The diagnostics system was incorrectly reporting PTY leaks. Opening one
terminal in the fix-pty-leak worktree showed:
- PTY Master Devices: 2
- Active PTY Sessions: 1
- Leaked PTY Masters: 1 (false positive!)

## Root Cause
node-pty opens /dev/ptmx twice per PTY session:
1. Once for the master FD
2. Once for internal operations

This is standard node-pty behavior. The leak detection logic at
stats-dialog.html:1018-1026 incorrectly assumed 1 FD = 1 session,
leading to false positives (2 FDs - 1 session = 1 "leak").

## Solution
- Modified getAppPtyFileDescriptors() to normalize the count by dividing
  raw /dev/ptmx FD count by 2 (packages/core/src/utils/system-diagnostics.ts:889-931)
- Updated stats dialog to display "PTY Master Sessions" instead of
  "PTY Master Devices" for clarity (apps/desktop/src/main/stats-dialog.html)
- Updated formatExtendedDiagnostics() to show normalized count in text output
- Added comprehensive comments explaining the normalization

## Result
Now correctly reports:
- PTY Master Sessions: 1 (normalized from 2 FDs)
- Active PTY Sessions: 1
- No false leak warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Problem
When creating terminal splits rapidly (e.g., vertical split then 2 horizontal
splits in quick succession), the app reported PTY leaks:
- PTY Master Sessions: 2 (after normalization)
- Active PTY Sessions: 1
- Leaked PTYs: 1

## Root Cause
Terminal IDs were generated using only `Date.now()`:
```typescript
const terminalId = `${worktreePath}-${Date.now()}`;
```

When multiple splits occurred within the same millisecond (very common with
rapid user actions), they received identical terminal IDs. This caused:
1. Multiple terminals to share the same PTY session
2. Closing one terminal wouldn't terminate the shared PTY
3. False "leaked" PTY detection

## Solution
Added a global counter to ensure unique terminal IDs even when created in
the same millisecond:

```typescript
let terminalIdCounter = 0;
const terminalId = `${worktreePath}-${Date.now()}-${terminalIdCounter++}`;
```

## Verification
Created comprehensive tests:
1. TerminalIdCollision.test.ts - Proves Date.now() alone causes collisions
2. TerminalSplitLeakTest.test.ts - Verifies ShellSessionManager works correctly
3. TerminalSplitLeakFixed.test.ts - Demonstrates the fix prevents session sharing

All tests pass:
- With counter: 4 terminals → 4 unique sessions → no leaks
- Without counter: 4 terminals → 1 shared session → false leaks

## Files Changed
- apps/desktop/src/renderer/components/TerminalGrid.tsx
  - Added terminalIdCounter global variable
  - Updated terminal ID generation in 2 locations (initial + split)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Rationale
As suggested in code review, using Date.now() is unnecessary when we have
a counter. The counter alone provides:

1. **Uniqueness**: Guaranteed unique IDs across all terminal creations
2. **Simplicity**: Shorter, cleaner IDs (e.g., "/path-0" vs "/path-1234567890-0")
3. **Readability**: Easier to debug and trace in logs
4. **Performance**: No Date.now() calls needed

## Changes
- Removed Date.now() from terminal ID generation in TerminalGrid.tsx
- Updated terminal ID format from `${path}-${Date.now()}-${counter}` to `${path}-${counter}`
- Updated tests to reflect simpler ID format

## Before
```typescript
const terminalId = `${worktreePath}-${Date.now()}-${terminalIdCounter++}`;
// Example: /Users/foo/project-1763112690681-0
```

## After
```typescript
const terminalId = `${worktreePath}-${terminalIdCounter++}`;
// Example: /Users/foo/project-0
```

All 25 terminal tests pass ✅

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ount

## Problem
After terminal ID collision fix, leak still reported:
- PTY Master Sessions: 2
- Active PTY Sessions: 1
- Reported: 1 leaked PTY

The previous normalization (divide by 2) was incorrect approach.

## Root Cause
We were normalizing the FD count by dividing by 2, then comparing
normalized count against active sessions. This caused issues when
the ratio wasn't exactly 2:1.

The correct approach: Compare RAW FD count against EXPECTED FD count.

## Solution
1. Return RAW /dev/ptmx FD counts (removed normalization)
2. Calculate expected FDs: activeSessions * 2
3. Compare: leakedFDs = rawFDs - expectedFDs

## Before
```
ptyMasterFds = rawCount / 2  // Normalization
leak = ptyMasterFds - activeSessions  // Wrong comparison
```

## After
```
ptyMasterFds = rawCount  // Keep raw
expectedFDs = activeSessions * 2
leak = ptyMasterFds - expectedFDs  // Correct comparison
```

## Display
Now shows:
- PTY Master FDs: 4 (2 sessions)
- Active PTY Sessions: 1
- Expected FDs: 2 (1 * 2)
- Leaked PTY FDs: 2 FDs (~1 session)

This accurately detects the leak by comparing actual vs expected FD counts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Summary
Investigated PTY file descriptor leak where stats dialog reports
"Potential Leaked PTYs" (e.g., "2 masters - 1 sessions").

## Root Cause
Testing revealed the leak originates in node-pty itself, not our code:
- node-pty opens 2 /dev/ptmx FDs per PTY session
- After kill(), only 1 FD is released
- Result: 1 FD leaked per terminated PTY on macOS

## Fixes Applied
Despite the underlying node-pty limitation, we've made critical
improvements to prevent additional leaks:

1. **Always call ptyProcess.kill()**: Even after sending SIGKILL to
   the process group, we must call ptyProcess.kill() to release PTY FDs
   (packages/core/src/utils/shell.ts:161-164)

2. **Properly dispose exit listener**: Store and dispose the exit
   listener disposable to prevent PTY object retention in memory
   (packages/core/src/services/ShellSessionManager.ts)

## Testing
Added comprehensive integration tests that reproduce and document the leak:
- pty-leak-integration.test.ts: Tests with session manager
- node-pty-direct-leak.test.ts: Tests node-pty directly (proves leak is upstream)

## Documentation
Created PTY_LEAK_INVESTIGATION.md with full analysis, test results,
and recommendations.

## Impact
- Memory cleanup improved (no PTY object retention)
- Remaining 1 FD/session leak is an accepted node-pty limitation
- Stats dialog accurately reports the leak
- Application remains stable with proper FD monitoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants