Skip to content

Unexpected server incoming requests queue after Node.js 20 version (possible performance regression) #57364

Open
@SuperOleg39

Description

@SuperOleg39

Version

v22.14.0

Platform

Darwin macbook-JPF9H46PN7 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64

Subsystem

http

What steps will reproduce the bug?

Repository with reproduction - https://github.com/SuperOleg39/node-js-perf-problem-reproduction

About repo:

server.js - a http server with:

  • /, /test/ and /unknown/ endpoinds with cpu-heavy work emulation
  • 5 parallel request processing limit for this endpoints for event loop overload protection
  • /readyz health check endpoint
  • /metrics metrics endpoint
  • /, /test/ and /unknown/ handlers wrapped in setImmediate to split heavy tasks by different event loop tasks

server.test.js - performance test case for this server:

  • autocannon make a 1000 concurrent requests for 1 minute to server endpoints /, /test/ and /unknown/
  • in parallel run 5 seconds interval with request to /readyz

For reproduction, need to run server.js with different node.js versions, and run server.test.js for every try (more in repository README.md)

How often does it reproduce? Is there a required condition?

It works correctly at 16, 18 and 19 Node.js versions, but not works at 20 and 22 versions.

What is the expected behavior? Why is that the expected behavior?

Event loop lag is small, incoming health check requests is processed fast, <500ms

What do you see instead?

Event loop lag is small, but incoming health check requests is processed slowly, >10s

Additional information

Hello!

We have a integration test for custom requests limiter, working on the principle of awesome under-pressure library - in summary with this requests limiter Node.js application can be responsive even under high load.

This test emulates a high load and endpoints with CPU-heavy work, and expects that fast requests to health checks or metrics will be processed almost immediately.

After update Node.js to v20 in our CI, we saw a failed test with very slow healthcheck responses.

I spend a few days to debug, and main results is:

  • event loop is not overloaded, lag is less than 500ms (okay for this test)
  • in CPU profile I can see microtasks between setImmediate calls
  • but incoming /readyz requests start to be processed after 10-20 seconds
  • problem somewere in core modules, because this lag starts from the request handler - http.createServer(handler) - and also deeper, event before internal server.emit('request') call.
  • if we split healthchecks and metrics endpoints to different http server in the same Node.js process, it respond fast in all Node.js versions - one more confirmation that there is some queue at the level of a particular http server

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssues and PRs related to the performance of Node.js.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions