Skip to content

Assertion failure crash in TLSWrap::DoWrite with zombie HTTP/2 session (close event not propagated from OS-level CLOSED socket) #61304

@dglittle

Description

@dglittle

Version

v22.13.1

Platform

Darwin 24.3.0 (macOS, arm64)

Subsystem

http2, tls

What steps will reproduce the bug?

Reproducible test case (requires sudo for firewall manipulation)

We created a test that simulates a network "black hole" using macOS pf firewall:

// test-zombie-blackhole.js
// Run with: sudo node test-zombie-blackhole.js

const http2 = require('http2')
const { spawn, execSync } = require('child_process')

const PORT = 8444

async function main() {
    if (process.getuid() !== 0) {
        console.log('Run with: sudo node test-zombie-blackhole.js')
        process.exit(1)
    }

    // Start server as child process
    const server = spawn('node', ['-e', `
        const http2 = require('http2')
        const fs = require('fs')
        const server = http2.createSecureServer({
            key: fs.readFileSync('./key.pem'),
            cert: fs.readFileSync('./cert.pem'),
        })
        server.on('stream', (s, h) => {
            s.respond({ ':status': 200 })
            s.end('ok')
        })
        server.listen(${PORT}, () => console.log('Server ready'))
        setInterval(() => {}, 10000)
    `], { stdio: 'inherit' })

    await new Promise(r => setTimeout(r, 1000))

    // Connect client
    const session = http2.connect(\`https://localhost:\${PORT}\`, { rejectUnauthorized: false })

    session.on('error', e => console.log('error event:', e.message))
    session.on('close', () => console.log('close event'))

    await new Promise(r => session.on('connect', r))
    console.log('Connected')

    // Make initial request to establish session
    await new Promise((resolve, reject) => {
        const s = session.request({ ':path': '/init' })
        s.on('end', resolve)
        s.on('error', reject)
        s.end()
    })

    // Block traffic with firewall (black hole - no RST/FIN, packets just disappear)
    execSync(\`echo "block drop quick proto tcp from any to 127.0.0.1 port \${PORT}" | pfctl -a zombie_test -f -\`)
    execSync('pfctl -e 2>/dev/null || true')
    console.log('Firewall blocking traffic')

    // Wait - session should remain "healthy" looking
    await new Promise(r => setTimeout(r, 5000))

    console.log('Session state:', {
        closed: session.closed,
        destroyed: session.destroyed,
        queueSize: session.state?.outboundQueueSize
    })

    // Attempt write to trigger crash
    const syms = Object.getOwnPropertySymbols(session)
    const sockSym = syms.find(s => s.toString().includes('socket'))
    const tls = sockSym ? session[sockSym] : session.socket

    console.log('Writing to socket...')
    tls.write('test')  // CRASHES HERE

    // Cleanup (won't reach here)
    execSync('pfctl -a zombie_test -F all')
    server.kill()
}

main()

What happens

  1. Server and client establish HTTP/2 connection over TLS
  2. Firewall rule creates a "black hole" (packets dropped, no RST/FIN sent)
  3. Session continues to report closed: false, destroyed: false
  4. No error or close events fire
  5. Writing to the TLS socket triggers an assertion failure crash

Original discovery

This was originally discovered in a long-running production process (~2 days) where the TCP socket entered CLOSED state at the OS level (visible via lsof) but Node.js never received the close event.

How often does it reproduce? Is there a required condition?

100% reproducible with the firewall-based test above.

In production, it's intermittent and requires:

  • Long-running HTTP/2 session
  • Network event that causes packet loss without proper TCP RST/FIN (NAT timeout, network partition, etc.)

What is the expected behavior? Why is that the expected behavior?

  1. Close events should propagate - When the OS-level TCP socket enters CLOSED state, this should propagate up through TLS and HTTP/2 layers, setting session.closed = true and emitting appropriate events

  2. Write should fail gracefully - Even if the zombie state occurs, calling .write() should return an error via callback, not crash with an assertion failure

What do you see instead?

  1. Close event is lost - All layers above TCP continue to report healthy status
  2. Writes queue up - session.state.outboundQueueSize grows indefinitely (we observed 2815 queued frames)
  3. Crash on write - Calling .write() crashes the process with assertion failure

Detailed debugging evidence

We attached Chrome DevTools inspector to the running process and gathered the following:

OS level (via lsof -p <pid>):

node <pid> 2128u IPv4 ... TCP ...:62689->...:https (CLOSED)

The file descriptor 2128 is in CLOSED state at the OS level.

Node level (via inspector):

sessionInfo.session.closed      // false - thinks it's open!
sessionInfo.session.destroyed   // false
sessionInfo.session.connecting  // false

// TLS socket also reports healthy:
socket.destroyed   // false
socket.readable    // true
socket.writable    // true
socket.readyState  // 'open'

// But the outbound queue is stuck:
session.state.outboundQueueSize // 2815 frames queued!
sessionInfo.pendingRejects.size // 3 requests waiting

// The TLS socket's underlying TCP handle IS the CLOSED fd:
socket._handle._parent.fd       // 2128 (the CLOSED socket!)

Ping test (callback never fires):

sessionInfo.session.ping((err, duration) => console.log(err, duration))
// Returns true (ping "sent") but callback NEVER executes

Fresh connection to same host works fine:

require('http2').connect('https://same-host.com').ping((e,d) => console.log(e,d))
// -> connected!
// -> ping: null 19.748583 (success, 20ms latency)

This proves the server is reachable; only the cached zombie session is broken.

Crash output

We've observed two different assertion failures depending on the scenario:

Crash 1: From reproducible test (firewall black hole)

#  node[7869]: virtual void node::http2::Http2Session::OnStreamAfterWrite(node::WriteWrap *, int) at ../src/node_http2.cc:1741
#  Assertion failed: is_write_in_progress()

----- Native stack trace -----

 1: 0x1043e8d1c node::Assert(node::AssertionInfo const&)
 2: 0x10601d89c node::http2::Http2Session::OnStreamAfterWrite(node::WriteWrap*, int) (.cold.1)
 3: 0x10441a9a4 node::http2::Http2Session::ClearOutgoing(int)
 4: 0x1044f8520 node::WriteWrap::OnDone(int)
 5: 0x1044f8848 node::StreamReq::Done(int, char const*)
 6: 0x104576f2c node::crypto::TLSWrap::InvokeQueued(int, char const*)
 7: 0x104578c88 node::crypto::TLSWrap::OnStreamAfterWrite(node::WriteWrap*, int)
...

Crash 2: From production debugging (long-running zombie session)

#  node[71801]: virtual int node::crypto::TLSWrap::DoWrite(node::WriteWrap *, uv_buf_t *, size_t, uv_stream_t *) at ../src/crypto/crypto_tls.cc:1033
#  Assertion failed: !current_write_

----- Native stack trace -----

 1: 0x102978d1c node::Assert(node::AssertionInfo const&)
 2: 0x1045de9bc node::crypto::TLSWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*) (.cold.8)
 3: 0x102b09e24 node::crypto::TLSWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*)
 4: 0x102a85198 node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool)
 5: 0x102a89288 int node::StreamBase::WriteString<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&)
...

----- JavaScript stack trace -----

1: handleWriteReq (node:internal/stream_base_commons:62:21)
2: writeGeneric (node:internal/stream_base_commons:148:15)
3: Socket._writeGeneric (node:net:971:11)
4: Socket._write (node:net:983:8)
5: writeOrBuffer (node:internal/streams/writable:572:12)
6: _write (node:internal/streams/writable:501:10)
7: Writable.write (node:internal/streams/writable:510:10)

Both crashes indicate internal state corruption in the TLS/HTTP2 layers when the underlying connection is silently broken.

Relationship to existing issues

This appears related to but distinct from previously fixed issues:

The key difference in our scenario: the OS socket is CLOSED but Node.js never received the close event, leaving the TLS and HTTP/2 layers in an inconsistent state where they believe the connection is healthy.

Additional information

The zombie session persisted for an extended period (potentially hours) before we discovered it via debugging. All requests to the affected origin silently failed (queued but never sent), while requests to other origins continued to work normally.

The session.state.outboundQueueSize growing while bytesWritten remains static is a clear indicator of this zombie state, but there's no documented way to detect this condition - all public APIs (session.closed, socket.writable, etc.) report the connection as healthy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions