Skip to content

Reconciler panics should not crash the manager #797

Closed
@ekuefler

Description

@ekuefler

Currently, an unhandled panic in a reconciler will not be recovered from, and will likely cause the manager binary to crash. This is a problem, since a panic might be triggered by a single resource in an unexpected state, so that one bad resource could prevent all other resources from being processed. Since Kubernetes is likely to restart the manager pod after a crash, this can also cause the manager to DOS the Kubernetes API server as it continually restarts.

In my project, I wrote this utility function:

func MakeSafe(r reconcile.Reconciler) reconcile.Reconciler {
	return safeReconciler{impl: r}
}

type safeReconciler struct {
	impl reconcile.Reconciler
}

func (r safeReconciler) Reconcile(request reconcile.Request) (result reconcile.Result, err error) {
	defer func() {
		if r := recover(); r != nil {
			result = reconcile.Result{}
			err = fmt.Errorf("panic: %v [recovered]\n\n%s", r, debug.Stack())
		}
	}()
	return r.impl.Reconcile(request)
}

Every time I pass a reconciler to Complete, I wrap it with this. It ensures that any panics raised by the reconciler are converted to normal errors.

Metadata

Metadata

Labels

help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/designCategorizes issue or PR as related to design.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions