Upgrading etcd cluster version from v3.2.24 to v3.3.15 made the k8s cluster apparently frozen #12225

Leulz · 2020-08-15T22:24:00Z

etcd version: 3.3.15
k8s version: 1.16

I am using the etcd-wrapper to run etcd in dedicated machines.

When I upgraded the first machine, I noticed an absurd increase in CPU usage. The average CPU usage with v3.2.24 was at around 40% of an EC2 m3.medium. It jumped to more than 90% after the upgrade, and now, even using m3.large instances, it's still at around 50~60%.

Alas, I decided to keep upgrading the cluster instead of rolling back, and now the k8s cluster is seemingly immutable. Thankfully it's in a staging environment.

The cluster reports itself as healthy:

$ etcdctl endpoint health
https://dns-1:2379 is healthy: successfully committed proposal: took = 6.031176ms
https://dns-2:2379 is healthy: successfully committed proposal: took = 2.313741ms
https://dns-3:2379 is healthy: successfully committed proposal: took = 5.478692ms

But I noticed that the Raft Index is sometimes significantly different across all instances:

$ etcdctl endpoint status
dns-1:2379, b01d7560e848897, 3.3.15, 544 MB, false, 174, 245011277
dns-2:2379, 345a61dc8892f7ca, 3.3.15, 530 MB, true, 174, 245011291
dns-3:2379, e99321f6d32addcb, 3.3.15, 545 MB, false, 174, 245011346

Lots (as in, dozens a second) of logs like this: Aug 15 19:57:10 internal-dns etcd-wrapper[2981]: 2020-08-15 19:57:10.849851 I | auth: deleting token <token> for user root can be seen

Other logs that look weird are: auth: invalid user name etcd-2 for permission checking, pkg/fileutil: purged file /var/lib/etcd/member/snap/00000000000000ae-000000000e945a73.snap successfully, and lots of etcdserver: read-only range request "key:\"/registry/pods/\" range_end:\"/registry/pods0\" " with result "range_response_count:808 size:13316569" took too long (127.988459ms) to execute.

The k8s cluster using this etcd cluster is, as mentioned, apparently frozen. I tried editing a deployment we have in the cluster, and the result was:

Pods before the patch:

pod-1                3/3     Running            2          30h
pod-2                3/3     Running            0          31h

Pods after editing a deployment to force a cycle:

pod-3                0/3     Terminating         0          32h
pod-4                0/3     Terminating         0          31h
pod-1                0/3     ContainerCreating   0          30h
pod-2                0/3     Pending             0          31h

Pods after some time:

pod-1                3/3     Running            2          31h
pod-2                3/3     Running            0          31h

Is this a known issue? Any insight in what is happening here is much appreciated.

The text was updated successfully, but these errors were encountered:

tangcong · 2020-08-16T01:11:15Z

But I noticed that the Raft Index is sometimes significantly different across all instances:

$ etcdctl endpoint status
dns-1:2379, b01d7560e848897, 3.3.15, 544 MB, false, 174, 245011277
dns-2:2379, 345a61dc8892f7ca, 3.3.15, 530 MB, true, 174, 245011291
dns-3:2379, e99321f6d32addcb, 3.3.15, 545 MB, false, 174, 245011346

do you enable auth? please see #11689.

Leulz · 2020-08-16T15:57:11Z

do you enable auth? please see #11689.

Thanks a lot for pointing me to that issue, @tangcong! I believe that is indeed what is happening in my cluster. I noticed that there are some logs like the following in my cluster's leader logs:

etcd-wrapper[2629]: 2020-08-16 15:18:57.148544 W | etcdserver: request "header:<ID:17855211145036495243 username:\"root\" auth_revision:10 > lease_revoke:<id:5dcb73f416b8f899>" with result "size:31" took too long (118.715626ms) to execute

Which I guess indicates that auth is indeed enabled and there are lease_revoke requests being issued.

I noticed that the only solution proposed by you is to first upgrade to the latest 3.2 version, and then upgrade to 3.3. Does that mean that this cluster I have is in an unrecoverable state and I should just obliterate it? Since the entire cluster is already at v3.3.

Also, I couldn't find the note you added related to this issue here. Shouldn't it be there?

tangcong · 2020-08-16T16:28:06Z

If your clusters are already inconsistent, you can only remove the follower nodes one by one, and then add them to the cluster to make the cluster consistent. Note that there is no guarantee that your data is complete.

could you also release a new version for 3.2 when you release new version for 3.4/3.3, which includes a fix for a data inconsistency bug. thanks. @gyuho

The etcd website doc has not been updated for a long time, I will see how to update it, thank you. note is latest here.@Leulz

Leulz closed this as completed Aug 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading etcd cluster version from v3.2.24 to v3.3.15 made the k8s cluster apparently frozen #12225

Upgrading etcd cluster version from v3.2.24 to v3.3.15 made the k8s cluster apparently frozen #12225

Leulz commented Aug 15, 2020 •

edited

Loading

tangcong commented Aug 16, 2020

Leulz commented Aug 16, 2020 •

edited

Loading

tangcong commented Aug 16, 2020

Upgrading etcd cluster version from v3.2.24 to v3.3.15 made the k8s cluster apparently frozen #12225

Upgrading etcd cluster version from v3.2.24 to v3.3.15 made the k8s cluster apparently frozen #12225

Comments

Leulz commented Aug 15, 2020 • edited Loading

tangcong commented Aug 16, 2020

Leulz commented Aug 16, 2020 • edited Loading

tangcong commented Aug 16, 2020

Leulz commented Aug 15, 2020 •

edited

Loading

Leulz commented Aug 16, 2020 •

edited

Loading