Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readinessProbe/minReadySeconds to kube-router #4420

Merged
merged 1 commit into from
May 22, 2024

Conversation

twz123
Copy link
Member

@twz123 twz123 commented May 15, 2024

Description

This allows for better feedback of kube-router health via the DaemonSet resource. Without those, it's possible to observe a "healthy" DaemonSet, even if it's not. This affects e.g. rolling updates, and, most notably k0s's own integration tests.

See:

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@twz123 twz123 added bug Something isn't working component/kube-router labels May 15, 2024
@twz123 twz123 added the backport/release-1.30 PR that needs to be backported/cherrypicked to the release-1.30 branch label May 15, 2024
This allows for better feedback of kube-router health via the DaemonSet
resource. Without those, it's possible to observe a "healthy" DaemonSet,
even if it's not. This affects e.g. rolling updates, and, most notably
k0s's own integration tests.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
@twz123 twz123 force-pushed the kube-router-readinessprobe branch from 3180ab6 to 65fcf39 Compare May 16, 2024 08:56
@twz123 twz123 marked this pull request as ready for review May 16, 2024 10:31
@twz123 twz123 requested a review from a team as a code owner May 16, 2024 10:31
@twz123 twz123 requested review from makhov and jnummelin May 16, 2024 10:31
port: 20244
initialDelaySeconds: 10
port: healthz
initialDelaySeconds: 300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems bit excessive? What's the reasoning for such a long delay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, liveness probes are only helpful in very few cases. Forcefully restarting a container over and over again is usually not helping much and will just increase churn/load on a cluster that is probably already busy with other things that lead to healthz answering with non-2xx responses. That's why I prefer high timeouts here. Henning wrote a blog post about this back in the day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic difference here is that an app usually reports failures that it might be able to recover from by itself via the readiness endpoint. Restarting the app won't help and might even worse the situation in such a case. Unrecoverable errors should make an app terminate itself. This leaves the liveness probe to detect situations in which the app itself is broken due to things like deadlocks, tight endless loops, blocked on system calls that usually don't block for too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@twz123 twz123 added backport/release-1.30 PR that needs to be backported/cherrypicked to the release-1.30 branch and removed backport/release-1.30 PR that needs to be backported/cherrypicked to the release-1.30 branch labels May 16, 2024
@twz123 twz123 merged commit 3cc1ed9 into k0sproject:main May 22, 2024
77 checks passed
@twz123 twz123 deleted the kube-router-readinessprobe branch May 22, 2024 18:56
@k0s-bot
Copy link

k0s-bot commented May 22, 2024

Successfully created backport PR for release-1.30:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/release-1.30 PR that needs to be backported/cherrypicked to the release-1.30 branch bug Something isn't working component/kube-router
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants