-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The value of "datacenter" for TLS ClientHello in a WAN setup is not dynamic? #5357
Comments
It's important to note that this should not be related to:
We have
This sentence is not actually true: we're actually leaving it to the TLS lib to verify (based on all the existing rules x509 dictates). Including, but bot limited to:
and so on. So we are leaving it up to the TLS lib to do its thing, thus this block is probably not something we want to do only if When |
Thanks for reporting @splashx! I tried to reproduce your issue, but everything is fine for me, independently from setting Your conclusion is also not quite right, consul has its own cert checking: Lines 232 to 278 in a093af3
It doesn't check
We only want to do it when Is there anything else that could help me reproduce your issue? |
Thanks for the quick response @i0rek Steps to reproduce
|
Thank you @mbag for your detailed response! I now can reproduce your issue! You found a bug: disabling Another fix for your setup would be to remove the server_name from your config. I will think about, how to properly fix this on and report back here. |
Overview of the Issue
We're finishing up deployment of a two DC setup, 5 servers in each DC and we're at the final stage where we're bootstrapping TLS. We have managed to have a healthy status for each DC (LAN) but when it comes to WAN, the two clusters can't talk to each other - gossip works, but grpc doesnt.
We see the following in one of the clusters:
But:
my-dc1
/my-dc2
- we use a string in the format[a-z]{1,2}\-[a-z]{1,7}[0-9]+
.Reproduction Steps
sd.example.com
datacenter
config variable tomy-dc1
andmy-dc2
on the 2 clustersserver_name
toserver.my-dc1.sd.example.com
on each node inmy-dc1
server_name
toserver.my-dc2.sd.example.com
on each node inmy-dc2
NOTE: the certificate creation steps were based on the manual.
Consul info for both Client and Server
We see the following error message on
my-dc1
(the ip addresses in10.125.25.0/24
are frommy-dc2
)Whe dug a bit deeper (a.k.a
tcpdump
) to try to decipher whatbad certificate
would really mean and we noticed that when nodes frommy-dc1
try contact nodes ofmy-dc2
, they are sending aClientHello
message with theserver_name
value ofserver.my-dc1.<our_domain>
- and obviously that will fail, because the nodes frommy-dc2
:CN
containingserver.my-dc1.<our_domain>
norsubjectAltName
containingserver.my-dc1.<our_domain>
And thus TLS handshake fails.
To solve this problem we had to reissue all the certificates to contain all dcs in the
subjectAltName
. This is a problem because for every new added cluster in a new DC we need to reissue all certificates to include that new DC in thesubjectAltName
.I suppose this is not the desired behavior - IMHO when contacting an IP address of another DC, consul, when acting as a TLS client, should dynamically change the
server_name
value to dynamically matchserver.<dc_name>.<consul_domain>
.Operating system and Environment details
Consul 1.2.3, Ubuntu 16.04
The text was updated successfully, but these errors were encountered: