[Proposal] Unified Gateway Container: Consolidate Router and Policy Engine into a Single Container #939

renuka-fernando · 2026-02-05T03:12:02Z

renuka-fernando
Feb 5, 2026
Collaborator

Summary

Merge the Router (Envoy-based) and Policy Engine (Go) containers into a single container running two processes
Use tini as PID 1 init process with a bash entrypoint script managing both processes
Log tagging via process substitution ([rtr] / [pol]) to distinguish output — or skip prefixes entirely (see Alternative 1)
Container exits when either process dies, delegating restart to the orchestrator (Kubernetes/Docker)
Replace inter-container TCP networking with Unix Domain Socket (UDS) for Router ↔ Policy Engine gRPC communication

Motivation

The gateway should be a single Docker image for users — currently Router and Policy Engine are separate images, requiring users to manage two images, two tags, and two versions
Using UDS for Router ↔ Policy Engine gRPC communication within the same container improves performance compared to inter-container TCP networking

Proposal

Run both Router and Policy Engine as child processes under a single container with tini as PID 1
Use a bash entrypoint script that starts both processes in the background and exits when either one dies
Implement log tagging using bash process substitution to prefix output lines with [rtr] or [pol]
Accept the loss of resource isolation as a deliberate trade-off that aligns with the actual failure model
Rely on orchestrator (Kubernetes, Docker Compose, Docker) for container restarts rather than in-container process supervision
Use Unix Domain Socket (UDS) for Router ↔ Policy Engine gRPC communication instead of TCP

The unified container simplifies deployment while reducing latency. Note: The ext_proc gRPC communication between Router and Policy Engine cannot be eliminated — Envoy must call the Policy Engine for every request. What changes is the transport: UDS within the same container replaces TCP over container networking. Since Router depends entirely on Policy Engine for request processing, treating them as an atomic deployment unit reflects their operational coupling. The entrypoint script handles process lifecycle and signal propagation correctly using tini for proper signal handling and zombie reaping.

Changes Required

Gateway: Dockerfile - Create unified image containing both Router (Envoy) and Policy Engine binaries
Gateway: docker-entrypoint.sh - New entrypoint script to launch and monitor both processes
Gateway: Base image - Add tini as init process (many base images include it, or install via package manager)
Gateway-Controller: pkg/config/config.go - Add UDS socket path option to PolicyEngineConfig (alternative to host:port)
Gateway-Controller: pkg/xds/translator.go - Update createPolicyEngineCluster() to use core.Address_Pipe when UDS is configured
Policy Engine: Update gRPC server to listen on UDS (/var/run/policy-engine.sock) instead of TCP port
Build: Makefile - Update build targets to produce single unified gateway image
Distribution: docker-compose.yaml - Replace separate router and policy-engine services with single gateway service
Helm Chart: Update chart to define a single gateway deployment instead of separate router and policy-engine deployments; remove inter-service networking, update resource limits, and adjust probes
Integration Tests: Update gateway/it/docker-compose.test.yaml to use unified gateway image; adjust test steps that target policy-engine container directly (e.g., admin endpoint on port 9002, metrics on port 9003)
Documentation: Update deployment guides to reflect single-container architecture

Process Architecture

The unified container uses this process tree:

tini  (PID 1)
  |-- docker-entrypoint.sh
        |-- Router
        |-- Policy Engine

Signal handling across three layers:

Layer	Responsibility
`tini` at PID 1	Catches SIGTERM/SIGINT from docker stop, k8s pod termination, or OOM kills and forwards to children. Reaps zombie processes. Preserves exit codes.
Entrypoint script	Runs both processes in background. Handles two shutdown flows: (A) one process crashes — entrypoint SIGTERMs the survivor with a timeout, then SIGKILLs if needed; (B) external SIGTERM from orchestrator — both processes already received SIGTERM via tini, entrypoint just waits and lets the orchestrator enforce its own timeout.
Exit code propagation	Entrypoint passes exit code through. Exit 137 = OOM kill. Exit 143 = normal SIGTERM shutdown.

Shutdown timeout ownership: Whoever initiates the shutdown owns the timeout. Orchestrator-initiated shutdown (docker stop / k8s) → orchestrator owns the timeout (--stop-timeout / terminationGracePeriodSeconds). Crash-initiated shutdown → entrypoint owns the timeout, configured via SHUTDOWN_TIMEOUT env var (default 10s, matches Docker's default --stop-timeout).

Reference Entrypoint Script

#!/bin/bash

SHUTDOWN_TIMEOUT=${SHUTDOWN_TIMEOUT:-10}   # env-configurable, default 10s

/app/router        > >(while IFS= read -r line; do echo "[rtr] $line"; done) \
                   2> >(while IFS= read -r line; do echo "[rtr] $line" >&2; done) &
ROUTER_PID=$!

/app/policy-engine > >(while IFS= read -r line; do echo "[pol] $line"; done) \
                   2> >(while IFS= read -r line; do echo "[pol] $line" >&2; done) &
PE_PID=$!

# Flow B: orchestrator-initiated shutdown (docker stop / k8s termination).
# tini already forwarded SIGTERM to both processes — they handle their own
# graceful drain. Entrypoint just waits; the orchestrator will force-kill the
# container if processes don't exit within its own timeout.
trap 'wait $ROUTER_PID 2>/dev/null; wait $PE_PID 2>/dev/null; exit 143' TERM INT

# Flow A: one process crashed — entrypoint detects and owns shutdown of the survivor.
wait -n $ROUTER_PID $PE_PID
EXIT_CODE=$?

# Brief pause so log readers can flush their last lines
sleep 0.1

# Identify which process exited and which is still running
if kill -0 $ROUTER_PID 2>/dev/null; then
    echo "[ent] policy-engine (PID $PE_PID) exited with code $EXIT_CODE"
    SURVIVOR=$ROUTER_PID; SURVIVOR_NAME="router"
else
    echo "[ent] router (PID $ROUTER_PID) exited with code $EXIT_CODE"
    SURVIVOR=$PE_PID; SURVIVOR_NAME="policy-engine"
fi

# Graceful shutdown of survivor — no one else will do this in a crash scenario.
echo "[ent] sending SIGTERM to $SURVIVOR_NAME (PID $SURVIVOR), timeout ${SHUTDOWN_TIMEOUT}s"
kill -TERM $SURVIVOR

for i in $(seq 1 $SHUTDOWN_TIMEOUT); do
    kill -0 $SURVIVOR 2>/dev/null || break
    sleep 1
done

if kill -0 $SURVIVOR 2>/dev/null; then
    echo "[ent] $SURVIVOR_NAME did not exit in time, sending SIGKILL"
    kill -9 $SURVIVOR
fi

exit $EXIT_CODE

Why explicit PIDs in wait -n: The log-tagging process substitution spawns reader subshells. Without explicit PIDs, wait -n might return when a reader exits rather than when Router or Policy Engine exits.

Why the trap doesn't loop with a timeout: In Flow B, the orchestrator (Docker/K8s) is responsible for the force-kill deadline. The entrypoint just needs to stay alive and wait — if both processes are hung, the orchestrator will SIGKILL the entire container. Adding a redundant timeout loop here would just duplicate the orchestrator's responsibility.

Inter-Process Communication (IPC) via UDS

The Router (Envoy) communicates with the Policy Engine via gRPC for the ext_proc filter. This communication cannot be eliminated — it's fundamental to how policies are executed.

Current architecture (two containers):

Router (container A) --[gRPC/TCP]--> Docker Network --[gRPC/TCP]--> Policy Engine (container B)
                          │
                          └── TCP handshake, packet framing, network stack traversal

Proposed architecture (unified container with UDS):

Router ──[gRPC/UDS]──> /var/run/policy-engine.sock ──> Policy Engine
                          │
                          └── File-based IPC, no TCP overhead, ~30-50% lower latency

Gateway-Controller xDS cluster configuration change:

// Before: TCP socket address
address := &core.Address{
    Address: &core.Address_SocketAddress{
        SocketAddress: &core.SocketAddress{
            Protocol: core.SocketAddress_TCP,
            Address:  "policy-engine",  // container hostname
            Port:     9001,
        },
    },
}

// After: Unix Domain Socket
address := &core.Address{
    Address: &core.Address_Pipe{
        Pipe: &core.Pipe{
            Path: "/var/run/policy-engine.sock",
        },
    },
}

Policy Engine listener change:

// Before: TCP listener
listener, err := net.Listen("tcp", ":9001")

// After: UDS listener
os.Remove("/var/run/policy-engine.sock") // Clean up stale socket
listener, err := net.Listen("unix", "/var/run/policy-engine.sock")

Drawbacks

Resource isolation lost: Shared memory and CPU between C++ (Envoy) and Go runtimes. Memory pressure in one process affects the other. OOM killer targets the container as a unit.
Debugging complexity: Two processes in one container can make troubleshooting slightly harder. Mitigated by log tagging.
Independent scaling impossible: Cannot scale Router and Policy Engine separately. However, they have a 1:1 operational dependency anyway.

Alternatives Considered

Alternative 1: No Log Prefixes (NGINX Ingress Controller approach)

What: Skip log tagging entirely — both processes write directly to stdout with no prefix wrapping
How NGINX does it: The NGINX Ingress Controller pod runs the controller process, NGINX master, and NGINX worker processes all in a single container. Per the NGINX Ingress Controller design docs: "NGINX Ingress Controller writes logs to stdout and stderr, which are collected by the container runtime." No prefixing is applied — all output is interleaved on the same stream.
Why it works for NGINX: NGINX master and workers share the same log format. Access logs and error logs are structurally different (access logs have request/response fields; error logs have severity levels). Users can distinguish them by content, not by prefix.
Why it may not work for us: Router (Envoy) and Policy Engine both emit generic startup, shutdown, and error messages that overlap in structure. Without prefixes, a line like "started listening on port X" is ambiguous — is that Router or Policy Engine?
Pros: Simpler entrypoint script (no process substitution, no reader subshells). No per-line processing overhead. No buffering risk from the while read loop.
Cons: Interleaved logs from two unrelated processes are harder to filter and debug. kubectl logs or docker logs grep becomes unreliable when both processes use similar log patterns.

The entrypoint would simplify to:

#!/bin/bash
/app/router &
ROUTER_PID=$!

/app/policy-engine &
PE_PID=$!

wait -n $ROUTER_PID $PE_PID
exit $?

Alternative 2: In-Container Process Supervisor (supervisord)

What: Use supervisord or s6-overlay to manage both processes within the container
Pros: More sophisticated process management, automatic restarts within container
Cons: Adds complexity; fights against orchestrator's restart model; larger image size
Why rejected: Kubernetes/Docker already provide container restart; adding another layer creates confusion about which system is responsible for restarts

Compatibility

Backwards compatible?: Yes - API behavior unchanged; only deployment topology changes
Requires migration?: Yes - Deployment configurations must be updated to use new unified image
Breaking changes?: No - Gateway functionality remains identical; only container structure changes

Migration Steps

Build and publish the new unified gateway image
Update docker-compose.yaml or Kubernetes manifests to use single gateway service instead of separate router and policy-engine services
Remove any inter-container networking configuration (service discovery, environment variables for Policy Engine host/port)
Update health check endpoints if previously targeting Policy Engine directly (Router health check now covers both)
Adjust resource limits to account for combined memory/CPU of both processes