-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add readinessProbe/minReadySeconds to kube-router #4420
Conversation
This allows for better feedback of kube-router health via the DaemonSet resource. Without those, it's possible to observe a "healthy" DaemonSet, even if it's not. This affects e.g. rolling updates, and, most notably k0s's own integration tests. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
3180ab6
to
65fcf39
Compare
port: 20244 | ||
initialDelaySeconds: 10 | ||
port: healthz | ||
initialDelaySeconds: 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems bit excessive? What's the reasoning for such a long delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, liveness probes are only helpful in very few cases. Forcefully restarting a container over and over again is usually not helping much and will just increase churn/load on a cluster that is probably already busy with other things that lead to healthz answering with non-2xx responses. That's why I prefer high timeouts here. Henning wrote a blog post about this back in the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic difference here is that an app usually reports failures that it might be able to recover from by itself via the readiness endpoint. Restarting the app won't help and might even worse the situation in such a case. Unrecoverable errors should make an app terminate itself. This leaves the liveness probe to detect situations in which the app itself is broken due to things like deadlocks, tight endless loops, blocked on system calls that usually don't block for too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Successfully created backport PR for |
Description
This allows for better feedback of kube-router health via the DaemonSet resource. Without those, it's possible to observe a "healthy" DaemonSet, even if it's not. This affects e.g. rolling updates, and, most notably k0s's own integration tests.
See:
Type of change
How Has This Been Tested?
Checklist: