-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm stacked etcd #69486
kubeadm stacked etcd #69486
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR @fabriziopandini . added a couple of comments.
// advertise address | ||
advertiseAddress := net.ParseIP(cfg.APIEndpoint.AdvertiseAddress) | ||
if advertiseAddress == nil { | ||
return nil, fmt.Errorf("error parsing APIEndpoint AdvertiseAddress %v: is not a valid textual representation of an IP address", cfg.APIEndpoint.AdvertiseAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better %v:
to be %q
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
|
||
// notifies the other members of the etcd cluster about the joining member | ||
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, did we investigate if this approach is safe?
what happens if the port is already taken on that endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the default peer port and "shouldn't collide" but we should definitely add a check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added check
cmd/kubeadm/app/phases/etcd/local.go
Outdated
// notifies the other members of the etcd cluster about the joining member | ||
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress) | ||
|
||
glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Member
-> member:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
glog.V(1).Infof("Updated etcd member list %v", initialCluster) | ||
|
||
glog.V(1).Infoln("creating local etcd static pod manifest file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly creating
should be uppercase for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Infoln
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
|
||
if len(initialCluster) == 0 { | ||
defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make such ports into constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should add them to the default consts file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in a separated PR to keep the scope of this PR as small as possible
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
|
||
ret := map[string]string{} | ||
for _, m := range resp.Members { | ||
// fixes the entry for of the joining member (that doesn't have a name set in the initialCluster returned by etcd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for of
-> for
or of
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) { | ||
cli, err := clientv3.New(clientv3.Config{ | ||
Endpoints: c.Endpoints, | ||
DialTimeout: 5 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if the connection times increase with the number of etcd pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would up the timeout for new connection. Default client inside the api-server is 20 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increased timeout to 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expectation that we've already copied over the etcd ca key/cert and generated the correct certificates?
cmd/kubeadm/app/phases/etcd/local.go
Outdated
@@ -36,10 +39,47 @@ const ( | |||
) | |||
|
|||
// CreateLocalEtcdStaticPodManifestFile will write local etcd static pod manifest file. | |||
// This function is used by init (when there the etcd cluster is empty) or by kubeadm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"(when there the etcd cluster is empty)" => "(when the etcd cluster is empty)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
} | ||
|
||
// AddMember notifies an existing etcd cluster that a new member is joining | ||
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost want this map[string]string
to be a struct, something like etcdClusterConfiguration
or similar. Do you think that would add anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done using a struct
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
return nil, err | ||
} | ||
|
||
// Note for reviewers: I'm not sure this is the best method for getting the endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good approach with one minor concern: there is a race condition that the PodIP may change after the PodIP lookup and before our client access, but I'm ok living with that possibility.
I wonder if there is a clean way to solve this by managing etcd as a statefulset. The other thought I had was adding a Service object to the static manifest and managing that by hand (by kubeadm). The goal of these two ideas is to provide a stable DNS name instead of a PodIP lookup. Either way, those types of changes would be way out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using ClusterStatus or a pre-set annotation to use as a starting point.
After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done reading from cluster status + calling sync to get the real list of endpoints from etcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of comments plus we should highlight that folks will need todo this on odd numbers for stacked deploys.
@@ -334,8 +338,8 @@ func GetEtcdPeerAltNames(cfg *kubeadmapi.InitConfiguration) (*certutil.AltNames, | |||
|
|||
// create AltNames with defaults DNSNames/IPs | |||
altNames := &certutil.AltNames{ | |||
DNSNames: []string{cfg.NodeRegistration.Name, "localhost"}, | |||
IPs: []net.IP{advertiseAddress, net.IPv4(127, 0, 0, 1), net.IPv6loopback}, | |||
DNSNames: []string{cfg.NodeRegistration.Name}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing loopback from the SAN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to dig through history but I remember us adding it on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sound strange, but restored
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
|
||
// notifies the other members of the etcd cluster about the joining member | ||
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the default peer port and "shouldn't collide" but we should definitely add a check here.
cmd/kubeadm/app/phases/etcd/local.go
Outdated
if err != nil { | ||
return err | ||
} | ||
glog.V(1).Infof("Updated etcd member list %v", initialCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do a local client member check, to verify it's correct, and to list the current members in the log line.
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
glog.V(1).Infof("Updated etcd member list %v", initialCluster) | ||
|
||
glog.V(1).Infoln("creating local etcd static pod manifest file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.
cmd/kubeadm/app/phases/etcd/local.go
Outdated
} | ||
|
||
if len(initialCluster) == 0 { | ||
defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should add them to the default consts file.
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
return nil, err | ||
} | ||
|
||
// Note for reviewers: I'm not sure this is the best method for getting the endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) { | ||
cli, err := clientv3.New(clientv3.Config{ | ||
Endpoints: c.Endpoints, | ||
DialTimeout: 5 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would up the timeout for new connection. Default client inside the api-server is 20 seconds.
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
} | ||
|
||
// AddMember notifies an existing etcd cluster that a new member is joining | ||
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
} | ||
|
||
// AddMember notifies an existing etcd cluster that a new member is joining | ||
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage
/assign @timothysc @detiber |
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress) | ||
|
||
glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress) | ||
initialCluster, err := etcdClient.AddMember(cfg.NodeRegistration.Name, etcdPeerAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used during upgrade? If so, it seems odd to call 'AddMember()' for an instance that is already a member of the etcd cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is used only when adding a new control plane instance
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
return nil, err | ||
} | ||
|
||
// Note for reviewers: I'm not sure this is the best method for getting the endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using ClusterStatus or a pre-set annotation to use as a starting point.
After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.
cmd/kubeadm/app/util/etcd/etcd.go
Outdated
return nil, err | ||
} | ||
|
||
// Note for reviewers: I'm not sure this is the best method for getting the endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.
bfba065
to
6e16274
Compare
@neolit123 @chuckha @timothysc @detiber Thanks for the valuable feedback! Now
The latest open point to be addressed before removing WIP is the usage of API server advertise an address, that can lead to problems in case the user choose an advertise address that doesn't correspond to any IP address on the machine |
@fabriziopandini Is this ready to go? If so can you remove the WIPs from this and other PRs. |
@timothysc last point pending is the discussion about usage of API server advertise for etcd |
@fabriziopandini ^ want to update now? /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini, timothysc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6e16274
to
e5886e5
Compare
@timothysc this is ready to go for me now |
/test pull-kubernetes-e2e-kops-aws |
/retest Review the full test history for this PR. Silence the bot with an |
/lgtm cancel |
e5886e5
to
fbd6d2d
Compare
/lgtm |
What this PR does / why we need it:
kubeadm now automatically creates a new stacked etcd member when joining a new control plane node (does not applies to external etcd)
Which issue(s) this PR fixes:
Fixes # kubernetes/kubeadm#1123
Special notes for your reviewer:
IMO two points deserve more attention:
I will keep this in WIP until there is agreement on the above points
Release note:
/sig cluster-lifecycle
/kind feature
/cc @timothysc
/cc @chuckha
/cc @detiber
@kubernetes/sig-cluster-lifecycle-pr-reviews