You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If a pod if left as a k8s resources and is in a state other then RUNNING the leadership configmap is left being owned by the exited pod causing web hook certificate to expire as leader election is unable to take place.
A pod can be in a state other then RUNNING for many reasons here are some.
Spot instances ( Pods can recorded as terminated be)
Evicted by control plane due to resource QoS.
To Reproduce
Deploy the vault-injector deployment to a spot instance node and let it be evicted and wait for certificate renewal time to pass and see the certificate has not been updated.
Expected behavior
Leadership to continue working even though pods have not been cleaned up, so that the certificate updates.
Environment
Kubernetes version: 1.24 GKE
vault-k8s version: 1.1.0
Additional context
Leadership should switch to using the leases resources provided by k8s, or something more robust
The text was updated successfully, but these errors were encountered:
Hi @owenhaynes, thanks for the info here. The current scheme is indeed relying on garbage collection to switch leadership, so it may not work in all cases. We chose it to avoid split-brain scenarios that we ran into with the previous scheme, which was more lease-based (though it pre-dated the advent of the lease resource in k8s). We may eventually be able to move to something else, but the current scheme has been much more stable so far.
Describe the bug
If a pod if left as a k8s resources and is in a state other then RUNNING the leadership configmap is left being owned by the exited pod causing web hook certificate to expire as leader election is unable to take place.
A pod can be in a state other then RUNNING for many reasons here are some.
To Reproduce
Deploy the vault-injector deployment to a spot instance node and let it be evicted and wait for certificate renewal time to pass and see the certificate has not been updated.
Expected behavior
Leadership to continue working even though pods have not been cleaned up, so that the certificate updates.
Environment
Additional context
Leadership should switch to using the leases resources provided by k8s, or something more robust
The text was updated successfully, but these errors were encountered: