Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leadership election broken when pods are not cleaned up #412

Open
owenhaynes opened this issue Jan 5, 2023 · 1 comment
Open

leadership election broken when pods are not cleaned up #412

owenhaynes opened this issue Jan 5, 2023 · 1 comment
Labels
enhancement New feature or request injector Area: mutating webhook service

Comments

@owenhaynes
Copy link

Describe the bug
If a pod if left as a k8s resources and is in a state other then RUNNING the leadership configmap is left being owned by the exited pod causing web hook certificate to expire as leader election is unable to take place.

A pod can be in a state other then RUNNING for many reasons here are some.

  1. Spot instances ( Pods can recorded as terminated be)
  2. Evicted by control plane due to resource QoS.

To Reproduce
Deploy the vault-injector deployment to a spot instance node and let it be evicted and wait for certificate renewal time to pass and see the certificate has not been updated.

Expected behavior
Leadership to continue working even though pods have not been cleaned up, so that the certificate updates.

Environment

  • Kubernetes version: 1.24 GKE
  • vault-k8s version: 1.1.0

Additional context
Leadership should switch to using the leases resources provided by k8s, or something more robust

@owenhaynes owenhaynes added the bug Something isn't working label Jan 5, 2023
@tvoran
Copy link
Member

tvoran commented Jan 12, 2023

Hi @owenhaynes, thanks for the info here. The current scheme is indeed relying on garbage collection to switch leadership, so it may not work in all cases. We chose it to avoid split-brain scenarios that we ran into with the previous scheme, which was more lease-based (though it pre-dated the advent of the lease resource in k8s). We may eventually be able to move to something else, but the current scheme has been much more stable so far.

In the meantime, I'd recommend using something like cert-manager to manage the certs for you, which would remove the need for any leadership electing in vault-k8s: https://developer.hashicorp.com/vault/docs/platform/k8s/helm/examples/injector-tls-cert-manager

@tvoran tvoran added enhancement New feature or request injector Area: mutating webhook service and removed bug Something isn't working labels Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request injector Area: mutating webhook service
Projects
None yet
Development

No branches or pull requests

2 participants