-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD doesn't automatically load changes to ca bundles for peer-trusted-ca-file or trusted-ca-file #11555
Comments
server-ca.crt has two cas in it
|
Since server-ca.crt has two CAs: Clients should be able to connect with certs signed by the second CA. Here I past an example of a cert signed by the second CA. NOTE: These are throw away credentials that are solely meant for the etcd team to be able to replicate the problem
|
This cert key pair is signed by the second CA. Then you should be able to access the cluster using
But as you can see the server does not accept the request because the second cert in the trust bundle is not recognized as a valid CA. Curious if this is expected or a known issue? |
Note if anyone is experiencing issues with this it can handle CA bundles however the etcd instances have to be explicitly restarted in order to pickup the new cert bundles. |
@relyt0925 why this has been closed? I'm experimenting with renewing CA certificates and etcd and it is surprising behavior to me that new peer/server certs are used dynamically, but not CA certificates. |
I can reopen as this hasn't been solved. I worked around it by triggering restarts whenever CAs were updated! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Please not stale bot |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Not stale. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Not stale |
Signed-off-by: Karuppiah Natarajan <karuppiahn@vmware.com>
Signed-off-by: Karuppiah Natarajan <karuppiahn@vmware.com> Signed-off-by: Karuppiah Natarajan <karuppiah7890@users.noreply.github.com>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Still not stale |
I've been looking at solving this in a system I run today. Currently, the next best thing we have is to have a process monitor the CA bundle on disk and then coordinate on gracefully restarting the etcd members - coordinating via etcd itself. This is obviously risky. It usually looks something like:
|
Naturally this would be far, far, better if it just happened automagically |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
This is not stale |
@gyuho @xiang90 (or one of the other maintainers) - Is there a way to permanently disable the stale bot for this issue? This is a very real issue that recently hit us in our move to cycle our root certs in our environments. Working around this issue requires a lot of orchestration complexity and is time intensive. Additionally, it violates the principle of least surprise given that for most other certificate changes, etcd will transparently load them from disk during the next client connection (#7784). However, in this rare case (CA bundles) etcd does not appear to do this and has the potential to cause downtime during CA cutovers. |
@aauren Contributions are welcomed. |
Definitely willing to take a look, but it would be helpful if the stale bot wasn't constantly trying to close this issue so that it can be properly tracked. It's already happened once before: #10400 |
I have started the discussion about the bot in #13775, need to write proposal for issue triage process. Maintainers would mark issue as Problem is that this issue was not looked by any contributor/maintainer. Meaning that issue is not as critical that someone would be willing to spend time to fix it. |
Ok... After a bit of poking around I think I see why CA certificates aren't reloaded on new connections the same way that certs and client certs are. The config object for crypto tls allows for GetCertificate and GetClientCertificate to be function based callbacks: https://github.com/golang/go/blob/master/src/crypto/tls/common.go#L557 etcd implements those and uses them to get a fresh copy of the cert and key file from the filesystem each time a new client connection is initiated: https://github.com/etcd-io/etcd/blob/main/client/pkg/transport/listener.go#L408 However, the config object does not expose a similar function based callback for loading CA certificates. For these it only exposes a single attribute: https://github.com/golang/go/blob/master/src/crypto/tls/common.go#L638 that is setup when the config is created. The only way that I can see to work around this without changing the flow completely would be to implement the
Nevermind, I see that #13307 does exactly that and is already in the process of being reviewed. I'm not sure how I missed that or why that wasn't recommended as it is an almost completed option instead of asking for a contribution. Anyway, I'll monitor the process of that PR. |
FTR: We had related discussion in #13902:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Not stale. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Not stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
not stale, would love to see #13307 or a similar version merged. |
Etcd does not automatically pickup changes to CA certificates. This commits adds an automatic restart step on certificate change. See etcd-io/etcd#11555 for context. Signed-off-by: Jean-Tiare Le Bigot <jean-tiare.lebigot@resilience.care>
hey! yeah this is a must have. #16500 looks promising, but needs CR. |
Etcd cannot handle cert bundles in the
peer-trusted-ca-file
ortrusted-ca-file
section. Without the ability to handle CA bundles, it is impossible to do a 0 downtime approach to CA rotation without resigning all active client and server certs at once.If a CA bundle was allowed: A new CA could be created and made valid in all components in the first interation. Then client certs can be resigned with the new CA since the server components have the new CA plus the old CA in it's trust bundle. Once all clients have been resigned and downloaded the old + new CA the server components can be signed with the new CA and then the old CA can be effectively removed.
It appears this was meant to be fixed but I am able to replicate the issue in an etcd deployment today.
I will expose all the certs and command line configurations in this issue so the exact steps can be replicated.
The text was updated successfully, but these errors were encountered: