Closed
Description
Describe the bug
If an NGF pod stops being the leader, it cannot become the leader again.
This becomes problematic when only one NGF pod is running. Because after it stops being the leader, this means it will not report any statuses. And since only one pod is running, this means no statuses will be reported at all (until the pod is restarted).
To Reproduce
The problem was observed when the pod lost connectivity to the k8s API server:
kubectl logs -n nginx-gateway <pod-name> -c nginx-gateway | grep leader
I0926 20:58:42.883382 6 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election...
I0926 20:58:43.073317 6 leaderelection.go:260] successfully acquired lease nginx-gateway/nginx-gateway-leader-election
{"level":"info","ts":"2023-09-26T20:58:43Z","logger":"leaderElector","msg":"Started leading"}
E0927 08:09:20.830614 6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0927 08:09:25.829736 6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": context deadline exceeded
I0927 08:09:25.830070 6 leaderelection.go:285] failed to renew lease nginx-gateway/nginx-gateway-leader-election: timed out waiting for the condition
{"level":"info","ts":"2023-09-27T08:09:25Z","logger":"leaderElector","msg":"Stopped leading"}
E0927 08:09:35.862628 6 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"nginx-gateway-leader-election.1788b315c7bd90e5", GenerateName:"", Namespace:"nginx-gateway", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Lease", Namespace:"nginx-gateway", Name:"nginx-gateway-leader-election", UID:"eb133a0d-7622-4b80-a0d1-d49755e52a1f", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"1044977", FieldPath:""}, Reason:"LeaderElection", Message:"nginx-gateway-b6cdb65cd-bt7zg stopped leading", Source:v1.EventSource{Component:"nginx-gateway-fabric-nginx", Host:""}, FirstTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), LastTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"nginx-gateway-fabric-nginx", ReportingInstance:""}': 'Post "https://10.64.0.1:443/api/v1/namespaces/nginx-gateway/events?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)'(may retry after sleeping)
{"level":"info","ts":"2023-09-27T17:19:27Z","logger":"statusUpdater","msg":"Skipping updating Nginx Gateway status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:13Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:15Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:24Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:25Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
Expected behavior
- The pod becomes the leader again
Your environment
NKF: ,"version":"edge","commit":"8e57fe86d311d6a618afa109999d80439d5ca9e9","date":"2023-09-22T17:16:36Z"
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3-gke.100", GitCommit:"6466b51b762a5c49ae3fb6c2c7233ffe1c96e48c", GitTreeState:"clean", BuildDate:"2023-06-23T09:27:28Z", GoVersion:"go1.20.5 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}