-
Notifications
You must be signed in to change notification settings - Fork 92
Description
I have reasoned a lot about readinessProbe and livenessProbes for the JupyterHub Helm chart. There is one conclusion that I'd like to convey and suggest for this Helm chart.
For details about those past experiences, see jupyterhub/zero-to-jupyterhub-k8s#1941.
My conclusion from previous experiences
All k8s Deployment resources that only have a single Pod replica, should if they have a readinessProbe at all, have the failureThreshold set to something very high so that the livenessProbe always triggers before the readinessProbe.
- This is because it doesn't make sense for the single replica pod to become UnReady if there is no other pod that can pick up the slack.
- A readinessProbe can still make sense to have if we care about being unready until we are successfully started. This is likely not relevant with only one replica as well.
The specific situation
I experienced a lot of network issues that I struggled to track down, but when I did I noticed that it was because the readinessProbe of the traefik pod were flaking from time to time. Whenever that happened, the traefik pod became UnReady and wouldn't receive traffic from the associated k8s Service. As there were only 1 replica of the traefik instance, it meant any traffic sent to the traefik k8s Service would fail.
The outcome was intermittent network issues. With a single replica traefik like I've used (the default), I suggest we increase the readinessProbe failureThreshold to 1000 or similar, or disabling it entirely. I'm not confident if the traefik instance supports running in multiple replicas or so, but if it doesn't that at least make sense.
Action point
- Clarify the ability or inability forthe api pod, traefik pod, and controller pod are able to run with multiple replicas.
- Controller is not HA, Traefik is HA, Gateway (api pod) is HA.
- Suggested implementation: let all single replica deployments have a readinessProbe that is only enabled if more than one replica is used