feat: add cluster management by phvalguima · Pull Request #39 · Aleph-Alpha/responses.js

phvalguima · 2026-04-14T09:01:07Z

This PR allows to run several replicas of the server in parallel.

github-actions · 2026-04-14T09:01:21Z

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 79285ca.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copilot

Pull request overview

Adds infrastructure intended to support running multiple server replicas (clustered workers) by improving upstream connection reuse, server listen configuration, and introducing a cluster startup entrypoint for local/Docker runs.

Changes:

Reuse a single shared undici Agent per worker to avoid per-request connection pools.
Make server startup more cluster-/ops-friendly (numeric port parsing, configurable TCP backlog, bind address, and graceful shutdown via server.close()).
Add start:cluster and switch Docker CMD to run dist/cluster.js.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/routes/responses/handleOneTurn.ts`	Introduces a shared `undici` dispatcher and uses it for OpenAI requests.
`src/index.ts`	Updates listen configuration (port parsing, backlog, bind host) and changes shutdown to `server.close()`.
`package.json`	Adds `start:cluster` script targeting `dist/cluster.js`.
`eslint.config.js`	Adds additional globals to satisfy linting for newly introduced runtime symbols/types.
`Dockerfile`	Changes container entrypoint from `dist/index.js` to `dist/cluster.js`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T09:24:39Z

src/index.ts

+const port = parseInt(String(process.env.PORT || 3000), 10);
 const highWaterMark = parseInt(process.env.STREAM_HIGH_WATER_MARK || "65536", 10);
+const backlog = parseInt(process.env.TCP_BACKLOG || "5000", 10);

 // Start server with configurable highWaterMark for SSE streaming backpressure
-createServer({ highWaterMark }, app).listen(port, () => {
-	logger.info({ port, highWaterMark }, "Server started");
+// and configurable TCP backlog (SO_MAXCONN / listen queue depth)
+const server = createServer({ highWaterMark }, app);
+server.listen(port, "0.0.0.0", backlog, () => {
+	logger.info({ port, highWaterMark, backlog, pid: process.pid }, "Server started");


port/highWaterMark/backlog are parsed with parseInt but not validated. If PORT or TCP_BACKLOG are set to a non-numeric value, parseInt returns NaN and server.listen(...) will throw at startup. Consider validating (Number.isFinite / integer range) and either falling back to defaults or exiting with a clear configuration error message.

True, but that happens once, at startup and will trigger crashloop restart of the container.

Copilot · 2026-04-15T09:24:39Z

src/index.ts

+const server = createServer({ highWaterMark }, app);
+server.listen(port, "0.0.0.0", backlog, () => {
+	logger.info({ port, highWaterMark, backlog, pid: process.pid }, "Server started");


Binding explicitly to "0.0.0.0" forces IPv4-only listening; on IPv6-only hosts this can prevent the server from accepting connections. Consider making the bind address configurable (e.g., HOST env var) or omitting the host argument to let Node bind to the system default.

Copilot · 2026-04-15T09:24:40Z

src/index.ts

 process.on("SIGINT", () => {
-	logger.info("Server shutting down (SIGINT)");
-	process.exit(0);
+	logger.info({ pid: process.pid }, "Server shutting down (SIGINT)");
+	server.close(() => process.exit(0));
 });

 process.on("SIGTERM", () => {
-	logger.info("Server shutting down (SIGTERM)");
-	process.exit(0);
+	logger.info({ pid: process.pid }, "Server shutting down (SIGTERM)");
+	server.close(() => process.exit(0));
 });


server.close() waits for existing connections to finish; with long-lived SSE streams this callback may never run, causing SIGTERM/SIGINT shutdown to hang. Consider adding a shutdown timeout and using server.closeIdleConnections()/server.closeAllConnections() (Node >=20) or tracking sockets to force-close after the grace period.

package.json

Dockerfile

Copilot · 2026-04-15T09:24:41Z

src/routes/responses/handleOneTurn.ts

+// Shared undici Agent per worker process — avoids creating a new connection pool per request.
+// Configurable via UPSTREAM_MAX_CONNECTIONS (connections per origin) and UPSTREAM_KEEP_ALIVE_TIMEOUT_MS.
+const sharedDispatcher = new Agent({
+	allowH2: true,
+	connections: parseInt(process.env.UPSTREAM_MAX_CONNECTIONS || "128", 10),
+	pipelining: 1,
+	keepAliveTimeout: parseInt(process.env.UPSTREAM_KEEP_ALIVE_TIMEOUT_MS || "30000", 10),
+	connectTimeout: parseInt(process.env.UPSTREAM_CONNECT_TIMEOUT_MS || "30000", 10),


The inline comment says the Agent is configurable via UPSTREAM_MAX_CONNECTIONS and UPSTREAM_KEEP_ALIVE_TIMEOUT_MS, but the code also reads UPSTREAM_CONNECT_TIMEOUT_MS. Update the comment to include the connect-timeout env var (or drop the env-var list) so the documentation matches behavior.

Copilot · 2026-04-15T09:24:41Z

src/routes/responses/handleOneTurn.ts

+// Shared undici Agent per worker process — avoids creating a new connection pool per request.
+// Configurable via UPSTREAM_MAX_CONNECTIONS (connections per origin) and UPSTREAM_KEEP_ALIVE_TIMEOUT_MS.
+const sharedDispatcher = new Agent({
+	allowH2: true,
+	connections: parseInt(process.env.UPSTREAM_MAX_CONNECTIONS || "128", 10),
+	pipelining: 1,
+	keepAliveTimeout: parseInt(process.env.UPSTREAM_KEEP_ALIVE_TIMEOUT_MS || "30000", 10),
+	connectTimeout: parseInt(process.env.UPSTREAM_CONNECT_TIMEOUT_MS || "30000", 10),


The Agent options are derived via parseInt(...) without validating the result. If any of these env vars are set to a non-numeric value, parseInt will return NaN and undici may throw or behave unexpectedly at runtime. Consider coercing with Number(...) and falling back to defaults (or throwing a clear startup error) when the parsed value is not a finite integer.

Suggested change

// Shared undici Agent per worker process — avoids creating a new connection pool per request.

// Configurable via UPSTREAM_MAX_CONNECTIONS (connections per origin) and UPSTREAM_KEEP_ALIVE_TIMEOUT_MS.

const sharedDispatcher = new Agent({

allowH2: true,

connections: parseInt(process.env.UPSTREAM_MAX_CONNECTIONS || "128", 10),

pipelining: 1,

keepAliveTimeout: parseInt(process.env.UPSTREAM_KEEP_ALIVE_TIMEOUT_MS || "30000", 10),

connectTimeout: parseInt(process.env.UPSTREAM_CONNECT_TIMEOUT_MS || "30000", 10),

function getFiniteIntegerEnv(name: string, fallback: number): number {

const rawValue = process.env[name];

if (rawValue === undefined) {

return fallback;

}

const parsedValue = Number(rawValue);

return Number.isFinite(parsedValue) && Number.isInteger(parsedValue) ? parsedValue : fallback;

}

// Shared undici Agent per worker process — avoids creating a new connection pool per request.

// Configurable via UPSTREAM_MAX_CONNECTIONS (connections per origin) and UPSTREAM_KEEP_ALIVE_TIMEOUT_MS.

const sharedDispatcher = new Agent({

allowH2: true,

connections: getFiniteIntegerEnv("UPSTREAM_MAX_CONNECTIONS", 128),

pipelining: 1,

keepAliveTimeout: getFiniteIntegerEnv("UPSTREAM_KEEP_ALIVE_TIMEOUT_MS", 30000),

connectTimeout: getFiniteIntegerEnv("UPSTREAM_CONNECT_TIMEOUT_MS", 30000),

feat: add cluster management

79285ca

$@frac$ frac requested a review from Copilot April 15, 2026 09:20

Copilot started reviewing on behalf of frac April 15, 2026 09:21 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cluster management#39

feat: add cluster management#39
phvalguima wants to merge 1 commit intomainfrom
add-cluster-management

phvalguima commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

phvalguima Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phvalguima commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

phvalguima Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants