Skip to content

Windows: CLI fails with 'Daemon failed to start' even when daemon is running #37

@jop3

Description

@jop3

Description

On Windows (tested with Git Bash/MSYS2), the CLI consistently fails with "Daemon failed to start" even though the daemon actually starts successfully and is listening on the correct port.

Environment

  • OS: Windows 10/11 (MSYS_NT-10.0-26200)
  • Node.js: v22.19.0
  • agent-browser: v0.4.4
  • Shell: Git Bash (MSYS2)

Steps to Reproduce

  1. Clean up any existing daemon:

    npx kill-port 50838
    rm -f "$LOCALAPPDATA/Temp/agent-browser-*"
  2. Run any CLI command:

    agent-browser open http://example.com
  3. Observe error: ✗ Daemon failed to start

  4. Check if daemon is actually running:

    netstat -an | grep 50838
    # Shows: TCP 127.0.0.1:50838 LISTENING
    
    ls "$LOCALAPPDATA/Temp/agent-browser-*"
    # Shows PID and port files exist

Analysis

Through extensive debugging, I found:

  1. The daemon DOES start successfully - it creates PID/port files and listens on port 50838
  2. The CLI's daemon_ready() check fails - even though TCP connections work fine from Node.js or PowerShell
  3. The CLI then calls cleanupSocket() which deletes the PID files
  4. The CLI tries to spawn a new daemon which fails because the port is already in use
  5. After 50 retry attempts, it gives up with "Daemon failed to start"

The issue appears to be in cli/src/connection.rs in the daemon_ready() function:

#[cfg(windows)]
fn daemon_ready(session: &str) -> bool {
    let port = get_port_for_session(session);
    TcpStream::connect_timeout(
        &format!("127.0.0.1:{}", port).parse().unwrap(),
        Duration::from_millis(50),
    )
    .is_ok()
}

The 50ms timeout may be too aggressive, or there's something about how the Rust TcpStream::connect_timeout works on Windows that causes intermittent failures.

Workaround

Manually start the daemon first, then use the CLI:

# Start daemon manually
node "$(npm root -g)/agent-browser/dist/daemon.js" &
sleep 2

# Now CLI commands work (connects to existing daemon)
agent-browser get url

Suggested Fix

  1. Increase the connect_timeout in daemon_ready() from 50ms to 200ms
  2. Add retry logic within daemon_ready() itself
  3. Don't call cleanupSocket() if the daemon is actually listening (check port before cleanup)

Additional Notes

  • The same daemon works perfectly when connected via Node.js net.Socket
  • PowerShell Test-NetConnection also connects successfully
  • The TIME_WAIT entries in netstat show the CLI IS making connections, they're just not being detected as successful

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions