Closed
Description
Currently, an unhandled panic in a reconciler will not be recovered from, and will likely cause the manager binary to crash. This is a problem, since a panic might be triggered by a single resource in an unexpected state, so that one bad resource could prevent all other resources from being processed. Since Kubernetes is likely to restart the manager pod after a crash, this can also cause the manager to DOS the Kubernetes API server as it continually restarts.
In my project, I wrote this utility function:
func MakeSafe(r reconcile.Reconciler) reconcile.Reconciler {
return safeReconciler{impl: r}
}
type safeReconciler struct {
impl reconcile.Reconciler
}
func (r safeReconciler) Reconcile(request reconcile.Request) (result reconcile.Result, err error) {
defer func() {
if r := recover(); r != nil {
result = reconcile.Result{}
err = fmt.Errorf("panic: %v [recovered]\n\n%s", r, debug.Stack())
}
}()
return r.impl.Reconcile(request)
}
Every time I pass a reconciler to Complete
, I wrap it with this. It ensures that any panics raised by the reconciler are converted to normal errors.
Metadata
Metadata
Assignees
Labels
Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Categorizes issue or PR as related to design.Indicates that an issue or PR should not be auto-closed due to staleness.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.