Description
Operating Environment
kops 1.8.0
aws
k8s 1.8.4
What we expect to happen
Our kops spec looks like this:
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2017-12-21T14:37:50Z
name: ecom-2.dev.domain.com
spec:
api:
loadBalancer:
type: Internal
...
masterInternalName: api.internal.ecom-2.dev.domain.com
Per the documentation here, we expect kops to create and maintain a route53 record for api.internal.ecom-2.dev.domain.com, which points to an AWS loadbalancer created for the masters.
What actually happens
We get instead a Route53 record that is a list of A records, corresponding to the masters .
Why this matters
Our workloads experience failures because the nodes cannot talk to a master any more--probably because they are using the ip address directly. When the node dies, workloads get failures doing lookups using kube-dns-- research has shown that this is due to the fact that kube-dns on the node is timing out trying to get to the masters via the old ip.
We can temporarily fix the issue by manually updating the Route53 record to point to the load balancer. We've verified that if we do this, and THEN kill a master, the workloads are not affected.
The problem is that when the new master comes back on line, the cluster re-writes the Route53 name back to A records-- so the workaround will not survive rolling updates.
I believe this is the root cause for #4247, and #3702, which I am going to close in favor of this.
I think this may be the actual root cause of #2634. The solution mentioned there is exactly as I describe, but I suspect would not survive a rolling update.
What we need
We need to know what configuration we have wrong that causes us to get an internal name pointing to the masters directly, and how to fix it.
Alternatively, a workaround that simply disables kops from updating the internal DNS record would be super helpful.