-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(controller): Add liveness probe #5875
Conversation
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@sarabala1979 while easy to implement, hard to create the circumstances to test, as you need to simulate a random crash in the controller, which does not happen. Is the test to just see if we don't see any more situations when we have pending workflows? Maybe need to dig deeper? |
cmd/workflow-controller/healthz.go
Outdated
func healthz(ctx context.Context, wfclientset wfclientset.Interface, managedNamespace string) { | ||
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) { | ||
err := func() error { | ||
list, err := wfclientset.ArgoprojV1alpha1().Workflows(managedNamespace).List(ctx, metav1.ListOptions{LabelSelector: "!" + common.LabelKeyPhase}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance ID needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instanceID?
cmd/workflow-controller/healthz.go
Outdated
func healthz(ctx context.Context, wfclientset wfclientset.Interface, managedNamespace string) { | ||
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) { | ||
err := func() error { | ||
list, err := wfclientset.ArgoprojV1alpha1().Workflows(managedNamespace).List(ctx, metav1.ListOptions{LabelSelector: "!" + common.LabelKeyPhase}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instanceID?
# This takes advantage of the fact that if the metrics service has died, | ||
# then the controller has died. | ||
# In testing, it appears to take 60-90s from failure to restart. | ||
- containerPort: 6060 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this port also exposes pprof endpoints, but these are already accessible
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
func (wfc *WorkflowController) Healthz(w http.ResponseWriter, r *http.Request) { | ||
ctx := r.Context() | ||
instanceID := wfc.Config.InstanceID | ||
instanceIDSelector := func() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whynowy added instance ID, I had to make this a receiver func on WorkflowController
because main.go
does not know the instance ID
} | ||
return nil | ||
}() | ||
log.WithField("err", err). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now log diagnostics time it is requested, this will help deal with problems.
@whynowy ready for review again |
Codecov Report
@@ Coverage Diff @@
## master #5875 +/- ##
==========================================
- Coverage 47.39% 47.30% -0.10%
==========================================
Files 247 248 +1
Lines 15623 15649 +26
==========================================
- Hits 7405 7403 -2
- Misses 7286 7313 +27
- Partials 932 933 +1
Continue to review full report at Codecov.
|
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
No description provided.