Skip to content

Conversation

@yashhhguptaaa
Copy link
Contributor

@yashhhguptaaa yashhhguptaaa commented Jan 14, 2026

Summary

  • Add panic recovery to certificate renewer worker
  • Add panic recovery to certificate provisioner worker
  • Follows existing pattern used in Slack sender (pkg/slack/sender.go)

Problem

When an ACME worker (Renewer or Provisioner) panics, it crashes the entire application because:

  1. Workers run inside errgroup - if one fails, everything stops
  2. Neither worker had defer recover() to catch panics

Solution

Wrap each work unit in an anonymous function with defer recover():

  func() {
      defer func() {
          if rec := recover(); rec != nil {
              p.logger.ErrorCtx(ctx, "panic recovered...",
                  log.String("panic", fmt.Sprintf("%v", rec)))
          }
      }()
      // work that might panic
  }() 

Now if a panic occurs:

  • It gets caught and logged
  • The worker continues running
  • Other services (API, mailer, etc.) are unaffected

Files Changed

  • pkg/certmanager/renewer.go - Added panic recovery to Run loop
  • pkg/certmanager/provisioner.go - Added panic recovery to Run loop

Test Plan

  • go build ./... passes
  • go vet ./pkg/certmanager/... passes
  • Manual test: Add temporary panic() call, verify it's caught and logged

Closes #600


Summary by cubic

Add panic recovery to ACME certificate renewer and provisioner workers to prevent app-wide crashes. Panics are caught and logged so workers and other services keep running.

  • Bug Fixes
    • Wrap initial and periodic checks in deferred recover blocks in both workers.
    • Log recovered panic details; mirrors the pattern in pkg/slack/sender.go.
    • Prevent errgroup from tearing down the app when a worker panics.

Written for commit 6077b92. Summary will update on new commits.

Signed-off-by: Yash Gupta <yash3144@gmail.com>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACME worker panic must not crash the complete application

1 participant