A way to force or reset consul CA root during leadership failure scenario. #6375
Labels
theme/certificates
Related to creating, distributing, and rotating certificates in Consul
theme/connect
Anything related to Consul Connect, Service Mesh, Side Car Proxies
theme/consul-vault
Relating to Consul & Vault interactions
type/enhancement
Proposed improvement or new feature
While testing and feeling out consul we got it configured with consul connect ca's vault provider and things worked well, however at one point we assumed that we could empty vault and that consul would be able to setup/change the root CA that is baked into the raft data. Once our test consul cluster was in this state, when coming online it would fall in a really fast loop failing to establish leadership with the following error repeated from the server nodes:
After a lot of hunting through docs and trying different ways to force a leader and get the certificate rolled or switched out we ended up just rebuilding the 3 server nodes to fix this. I think we learned our lesson to never mess around with the vault pki mounts that consul connect ca uses, otherwise the cluster gets into this state and it doesn't seem like you can ever bring it back online. Where it's stuck electing a leader it doesn't seem you can even work with a server node to attempt to fix or roll the CA cert out for a new one. It's actually quite easy to mess this up, all one has to do is mess with the pki mount in vault that consul connect ca is configured to use.
Are there any plans to force, expunge or get rid of the root CA in consul in a scenario like this in order to get things running again and a leader elected? Possibly a way to "re-bootstrap" the consul CA bits?
The text was updated successfully, but these errors were encountered: