Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start function of manager object not always returns error if it happens. #2873

Closed
Alexseij opened this issue Jul 4, 2024 · 1 comment
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Alexseij
Copy link
Contributor

Alexseij commented Jul 4, 2024

In my case error happened during initialization l object:

l, err := leaderelection.NewLeaderElector(leaderelection.LeaderElectionConfig{
Lock: cm.resourceLock,
LeaseDuration: cm.leaseDuration,
RenewDeadline: cm.renewDeadline,
RetryPeriod: cm.retryPeriod,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(_ context.Context) {
if err := cm.startLeaderElectionRunnables(); err != nil {
cm.errChan <- err
return
}
close(cm.elected)
},
OnStoppedLeading: func() {
if cm.onStoppedLeading != nil {
cm.onStoppedLeading()
}
// Make sure graceful shutdown is skipped if we lost the leader lock without
// intending to.
cm.gracefulShutdownTimeout = time.Duration(0)
// Most implementations of leader election log.Fatal() here.
// Since Start is wrapped in log.Fatal when called, we can just return
// an error here which will cause the program to exit.
cm.errChan <- errors.New("leader election lost")
},
},
ReleaseOnCancel: cm.leaderElectionReleaseOnCancel,
Name: cm.leaderElectionID,
})
if err != nil {
return err
}

Error contains this message:
leaseDuration must be greater than renewDeadline

After that error, function of manager object Start dont return it.

Because this code call here by function startLeaderElection:

// Start the leader election and all required runnables.
{
ctx, cancel := context.WithCancel(context.Background())
cm.leaderElectionCancel = cancel
go func() {
if cm.resourceLock != nil {
if err := cm.startLeaderElection(ctx); err != nil {
cm.errChan <- err
}
} else {
// Treat not having leader election enabled the same as being elected.
if err := cm.startLeaderElectionRunnables(); err != nil {
cm.errChan <- err
}
close(cm.elected)
}
}()
}

After this defers:

stopComplete := make(chan struct{})
defer close(stopComplete)
// This must be deferred after closing stopComplete, otherwise we deadlock.
defer func() {
// https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/gettyimages-459889618-1533579787.jpg
stopErr := cm.engageStopProcedure(stopComplete)
if stopErr != nil {
if err != nil {
// Utilerrors.Aggregate allows to use errors.Is for all contained errors
// whereas fmt.Errorf allows wrapping at most one error which means the
// other one can not be found anymore.
err = kerrors.NewAggregate([]error{err, stopErr})
} else {
err = stopErr
}
}
}()

And this defers call another defers one of which waiting for value in channel:

defer func() {
// Cancel leader election only after we waited. It will os.Exit() the app for safety.
if cm.resourceLock != nil {
// After asking the context to be cancelled, make sure
// we wait for the leader stopped channel to be closed, otherwise
// we might encounter race conditions between this code
// and the event recorder, which is used within leader election code.
cm.leaderElectionCancel()
<-cm.leaderElectionStopped
}
}()

But this channel is never being close because close function calls only when there is no error when l object initializing

if err != nil {
return err
}
// Start the leader elector process
go func() {
l.Run(ctx)
<-ctx.Done()
close(cm.leaderElectionStopped)
}()
return nil

I fix this issue in my fork repository, if this issue will be accepted I can do pull request.

@Alexseij Alexseij changed the title Start manager function not always returns error if it happens. Start function of manager object not always returns error if it happens. Jul 4, 2024
@alvaroaleman
Copy link
Member

/kind bug
Please feel free to submit a fix

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 6, 2024
@Alexseij Alexseij closed this as completed Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants