Skip to content

The readinessProbe for traefik pod (1 replica) is too sensitive #390

@consideRatio

Description

@consideRatio

I have reasoned a lot about readinessProbe and livenessProbes for the JupyterHub Helm chart. There is one conclusion that I'd like to convey and suggest for this Helm chart.

For details about those past experiences, see jupyterhub/zero-to-jupyterhub-k8s#1941.

My conclusion from previous experiences

All k8s Deployment resources that only have a single Pod replica, should if they have a readinessProbe at all, have the failureThreshold set to something very high so that the livenessProbe always triggers before the readinessProbe.

  • This is because it doesn't make sense for the single replica pod to become UnReady if there is no other pod that can pick up the slack.
  • A readinessProbe can still make sense to have if we care about being unready until we are successfully started. This is likely not relevant with only one replica as well.

The specific situation

I experienced a lot of network issues that I struggled to track down, but when I did I noticed that it was because the readinessProbe of the traefik pod were flaking from time to time. Whenever that happened, the traefik pod became UnReady and wouldn't receive traffic from the associated k8s Service. As there were only 1 replica of the traefik instance, it meant any traffic sent to the traefik k8s Service would fail.

The outcome was intermittent network issues. With a single replica traefik like I've used (the default), I suggest we increase the readinessProbe failureThreshold to 1000 or similar, or disabling it entirely. I'm not confident if the traefik instance supports running in multiple replicas or so, but if it doesn't that at least make sense.

Action point

  • Clarify the ability or inability forthe api pod, traefik pod, and controller pod are able to run with multiple replicas.
    • Controller is not HA, Traefik is HA, Gateway (api pod) is HA.
  • Suggested implementation: let all single replica deployments have a readinessProbe that is only enabled if more than one replica is used

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions