Skip to content

Statuses not reported because no leader gets elected #1100

Closed
@pleshakov

Description

@pleshakov

Describe the bug
If an NGF pod stops being the leader, it cannot become the leader again.
This becomes problematic when only one NGF pod is running. Because after it stops being the leader, this means it will not report any statuses. And since only one pod is running, this means no statuses will be reported at all (until the pod is restarted).

To Reproduce
The problem was observed when the pod lost connectivity to the k8s API server:

kubectl logs -n nginx-gateway <pod-name> -c nginx-gateway | grep leader
I0926 20:58:42.883382       6 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election...
I0926 20:58:43.073317       6 leaderelection.go:260] successfully acquired lease nginx-gateway/nginx-gateway-leader-election
{"level":"info","ts":"2023-09-26T20:58:43Z","logger":"leaderElector","msg":"Started leading"}
E0927 08:09:20.830614       6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0927 08:09:25.829736       6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": context deadline exceeded
I0927 08:09:25.830070       6 leaderelection.go:285] failed to renew lease nginx-gateway/nginx-gateway-leader-election: timed out waiting for the condition
{"level":"info","ts":"2023-09-27T08:09:25Z","logger":"leaderElector","msg":"Stopped leading"}
E0927 08:09:35.862628       6 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"nginx-gateway-leader-election.1788b315c7bd90e5", GenerateName:"", Namespace:"nginx-gateway", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Lease", Namespace:"nginx-gateway", Name:"nginx-gateway-leader-election", UID:"eb133a0d-7622-4b80-a0d1-d49755e52a1f", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"1044977", FieldPath:""}, Reason:"LeaderElection", Message:"nginx-gateway-b6cdb65cd-bt7zg stopped leading", Source:v1.EventSource{Component:"nginx-gateway-fabric-nginx", Host:""}, FirstTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), LastTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"nginx-gateway-fabric-nginx", ReportingInstance:""}': 'Post "https://10.64.0.1:443/api/v1/namespaces/nginx-gateway/events?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)'(may retry after sleeping)
{"level":"info","ts":"2023-09-27T17:19:27Z","logger":"statusUpdater","msg":"Skipping updating Nginx Gateway status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:13Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:15Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:24Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:25Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}

Expected behavior

  • The pod becomes the leader again

Your environment

NKF: ,"version":"edge","commit":"8e57fe86d311d6a618afa109999d80439d5ca9e9","date":"2023-09-22T17:16:36Z"
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3-gke.100", GitCommit:"6466b51b762a5c49ae3fb6c2c7233ffe1c96e48c", GitTreeState:"clean", BuildDate:"2023-06-23T09:27:28Z", GoVersion:"go1.20.5 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrefinedRequirements are refined and the issue is ready to be implemented.size/mediumEstimated to be completed within a week

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions