Open
Description
Version
20.12.2
Platform
Darwin olivers-mbp.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 arm Darwin
Subsystem
No response
What steps will reproduce the bug?
The following script creates 2 cluster workers and each cluster worker does the following:
- Start server (A) on port 0 (random port).
- Close server A.
- Once server A has closed, start another server (B) on the same port as the previous server (A).
import cluster from 'node:cluster';
import express from 'express';
if (cluster.isPrimary) {
const numCPUs = 2;
console.log(`Master process ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
const a = express();
const b = express();
const port = 0;
console.log(`[${process.pid}] [A] call listen on port`, port);
const serverA = a.listen(port, () => {
const randomPort = serverA.address().port;
console.log(`[${process.pid}] [A] listening on port`, randomPort);
serverA.close((error) => {
console.log(`[${process.pid}] [A] close`, error);
console.log(`[${process.pid}] [B] call listen on port`, randomPort);
const serverB = b.listen(randomPort, () => {
console.log(`[${process.pid}] [B] listening on port`, randomPort);
});
serverB.on('error', (error) => {
console.log(`[${process.pid}] [B] error`, error);
});
});
});
}
How often does it reproduce? Is there a required condition?
No response
What is the expected behavior? Why is that the expected behavior?
No error.
What do you see instead?
Sometimes, but not always, we see an EADDRINUSE
error. For example:
$ node test
Master process 16437 is running
[16438] [A] call listen on port 0
[16439] [A] call listen on port 0
[16438] [A] listening on port 58256
[16438] [A] close undefined
[16438] [B] call listen on port 58256
[16439] [A] listening on port 58256
[16439] [A] close undefined
[16439] [B] call listen on port 58256
[16439] [B] listening on port 58256
[16438] [B] error Error: bind EADDRINUSE null:58256
at listenOnPrimaryHandle (node:net:1969:18)
at rr (node:internal/cluster/child:163:12)
at Worker.<anonymous> (node:internal/cluster/child:113:7)
at process.onInternalMessage (node:internal/cluster/utils:49:5)
at process.emit (node:events:530:35)
at emit (node:internal/child_process:951:14)
at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
errno: -48,
code: 'EADDRINUSE',
syscall: 'bind',
address: null,
port: 58256
}
It seems to happen more frequently when the CPU is under pressure.
This is not expected because, as far as I understand:
- It should be possible to bind to the same port across cluster workers.
- Server A has been closed by the time we try to bind server B. (According to the documentation the close callback is only called once the server has closed (i.e. the port has been released?)
Additional information
I have been unable to reproduce the problem with a single cluster worker which suggests the problem only occurs when there's contention between cluster workers.