Status: Accepted
This proposal describes how to add better support for identifying the status of Envoy connections during a restart or redeploy of Envoy.
- Provide a mechanism to provide feedback on open connections & load inside an Envoy process
- Allow for a Envoy fleet roll-out minimizing connection loss
- Guarantee zero connection loss during a roll-out
The Envoy process, the data path component of Contour, at times needs to be re-deployed. This could be due to an upgrade, a change in configuration, or a node-failure forcing a redeployment.
Contour currently implements a preStop
hook in the container which signals to Envoy to begin draining connections.
Once Envoy begins draining, the readiness probe on the pod triggers to fail, which in turn causes that instance of Envoy to stop accepting new connections.
The main problem is that the preStop
hook (which sends the /healthcheck/fail
request) does not wait until Envoy has drained all connections, so when the container
is restarted, users can receive errors.
This design looks to add a new component to Contour which will allow for a way to understand if open connections exist in Envoy before sending a SIGTERM
.
Implement a new sub-command to Contour named envoy shutdown-manager
which will handle sending the healthcheck fail request
to Envoy and then begin polling the http/https listeners for active connections from the /stats
endpoint available on
localhost:9001
.
Additionally, an optional min-open-connections
parameter will be added which will allow users to define the minimum number of
open connections that can be open when waiting for connections to drain.
A preStop
hook (https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes allows time for a container to do any cleanup or extra processing before getting sent a SIGTERM.
This lifecycle hook blocks the process during this cleanup time and then returns when its ready to be shutdown.
- Implement new sub-command in Contour called
envoy shutdown-manager
- This command will expose an http endpoint over a specific port (e.g.
8090
) and path/shutdown
. - A new container will be added to the Envoy Daemonset pod which will run this new sub-command
- This new container as well as the Envoy container will be updated to use an http preStop hook (see example below)
- When the pod gets a request to shutdown, the
preStop
hook will send a request tolocalhost:8090/shutdown
which will tell Envoy to begin draining connections as well as start polling for active connections and will block until that reaches zero or themin-open-connections
is met - The
terminationGracePeriodSeconds
in the pod will be extended to a larger value (from the default 30 seconds) to allow time to drain connections. If this time limit is met, Kubernetes will SIGTERM the pod and kill it - A second endpoint,
/healthz
, will be made available to check for the health of this container which will be implemented in a Kubernetes liveness probe in the event the shutdown-manager exits
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
annotations:
labels:
app: envoy
name: envoy
namespace: projectcontour
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: envoy
template:
metadata:
annotations:
prometheus.io/path: /stats/prometheus
prometheus.io/port: "8002"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: envoy
spec:
automountServiceAccountToken: false
containers:
- command: # <----- New Pod
- /bin/shutdown-manager
image: stevesloka/envoyshutdown
imagePullPolicy: Always
lifecycle:
preStop: # <----- PreStop Hook
httpGet:
path: /shutdown
port: 8090
scheme: HTTP
livenessProbe: # <------ Liveness probe
httpGet:
path: /healthz
port: 8090
initialDelaySeconds: 3
periodSeconds: 10
name: shutdown-manager
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- -c
- /config/envoy.json
- --service-cluster $(CONTOUR_NAMESPACE)
- --service-node $(ENVOY_POD_NAME)
- --log-level info
command:
- envoy
env:
- name: CONTOUR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: ENVOY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: docker.io/envoyproxy/envoy:v1.13.0
imagePullPolicy: IfNotPresent
lifecycle: # <----- PreStop Hook
preStop:
httpGet:
path: /shutdown
port: 8090
scheme: HTTP
name: envoy
ports:
- containerPort: 80
hostPort: 80
name: http
protocol: TCP
- containerPort: 443
hostPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 4
httpGet:
path: /ready
port: 8002
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
volumeMounts:
- mountPath: /config
name: envoy-config
- mountPath: /certs
name: envoycert
- mountPath: /ca
name: cacert
dnsPolicy: ClusterFirst
initContainers:
- args:
- bootstrap
- /config/envoy.json
- --xds-address=contour
- --xds-port=8001
- --envoy-cafile=/ca/cacert.pem
- --envoy-cert-file=/certs/tls.crt
- --envoy-key-file=/certs/tls.key
command:
- contour
env:
- name: CONTOUR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: docker.io/projectcontour/contour:master
imagePullPolicy: Always
name: envoy-initconfig
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /config
name: envoy-config
- mountPath: /certs
name: envoycert
readOnly: true
- mountPath: /ca
name: cacert
readOnly: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 300
volumes:
- emptyDir: {}
name: envoy-config
- name: envoycert
secret:
defaultMode: 420
secretName: envoycert
- name: cacert
secret:
defaultMode: 420
secretName: cacert
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
type: RollingUpdate
- Bash scripts could be used to do this, however it's more difficult to implement and hard to test.
- Instead of using the http preStop hook, we could call a binary to do the checks, however, it's difficult to get this into the Envoy container reliably (we'd have to use some shared volume)
The only possible issue that could arise is something goes wrong in the shutdown logic and either the pod terminates before all active connections are drained, or it never quits which would then rely on the pod termination grace period to terminate the pod.