Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

0x2c7 · 2018-09-26T05:37:37Z

What you expected to happen?

The Weave Net Daemonset, who controls the weave-net pod in each node, should be able to restart a pod in a node when it fails / stopped for any reason.

What happened?

The weave-net pod gets Error and CrashLoopBack status and unable to function again, until I terminate that node.

How to reproduce it?

SSH into a node, use docker command to kill the weave-net container. Of course, this is just for re-produce. In fact, when on our production cluster, we sometimes meet the situation when weave-net crashes on a node and don't know why.

The logs point out that weave-net fails to create dummy interface:

FATA: 2018/09/26 05:08:04.369497 creating dummy interface: file exists

I have a small investigation, and it looks like the bug comes from net/bridge.go, in function initPrep, at those lines:

	dummy := &netlink.Dummy{LinkAttrs: netlink.NewLinkAttrs()}
	dummy.LinkAttrs.Name = "vethwedu"
	if err = netlink.LinkAdd(dummy); err != nil {
		return errors.Wrap(err, "creating dummy interface")
	}

Before the weave-net starts, it creates a dummy interface object, and when my pod starts, the interface already exists, checked with ip link | grep vethwedu command:

96782: vethwedu: <BROADCAST,NOARP> mtu 1376 qdisc noop state DOWN mode DEFAULT group default qlen 1000

It looks like in the previous session of weave-net, it fails to delete this dummy interface, or it is killed before deleting it. When I delete the dummy manually with ip link delete vethwedu, the pod runs smoothly and back to normal.

Adding a small check and delete if the dummy exists before creating a new one would solve this problem. Is it a good solution for this? If that's okay, I'll open an PR.

	if existingDummy, err = netlink.LinkByName("vethwedu"); err == nil {
            if err := netlink.LinkDel(existingDummy); err != nil {
              return errors.Wrap(err, "deleting existing dummy interface")
	    }
	}
        //...

Anything else we need to know?

I run our Kubernetes cluster on AWS, using KOPS.

Versions:

$ weave version: 2.4.1

$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64

$ uname -a
Linux ip-172-50-52-229 4.4.121-k8s #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 GNU/Linux

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

$ kubectl logs -n kube-system <weave-net-pod> weave

DEBU: 2018/09/26 05:08:04.266053 [kube-peers] Checking peer "7a:1f:1c:b2:b7:7e" against list &{[{ea:43:25:46:a3:95 ip-172-50-61-167.ap-southeast-2.compute.internal} {16:ac:8c:71:55:20 ip-172-50-80-215.ap-southeast-2.compute.internal} {3e:af:c2:26:08:00 ip-172-50-110-136.ap-southeast-2.compute.internal} {2e:04:f7:c2:42:71 ip-172-50-32-17.ap-southeast-2.compute.internal} {32:4d:7c:65:31:8d ip-172-50-50-186.ap-southeast-2.compute.internal} {ae:92:06:fd:b3:e7 ip-172-50-93-2.ap-southeast-2.compute.internal} {7e:69:df:8f:8f:17 ip-172-50-41-197.ap-southeast-2.compute.internal} {62:aa:0a:e8:65:96 ip-172-50-109-73.ap-southeast-2.compute.internal} {f6:30:3e:1d:4b:8b ip-172-50-113-192.ap-southeast-2.compute.internal} {7a:1f:1c:b2:b7:7e ip-172-50-52-229.ap-southeast-2.compute.internal} {12:b7:c5:3d:f4:82 ip-172-50-67-61.ap-southeast-2.compute.internal} {aa:0d:6b:9c:56:9b ip-172-50-44-14.ap-southeast-2.compute.internal} {82:e9:f1:ce:c5:29 ip-172-50-58-155.ap-southeast-2.compute.internal} {26:a8:11:0d:76:e2 ip-172-50-33-242.ap-southeast-2.compute.internal}]}
INFO: 2018/09/26 05:08:04.297726 Command line options: map[ipalloc-range:100.96.0.0/11 port:6783 docker-api: expect-npc:true host-root:/host nickname:ip-172-50-52-229.ap-southeast-2.compute.internal conn-limit:100 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 name:7a:1f:1c:b2:b7:7e datapath:datapath ipalloc-init:consensus=14 no-dns:true]
INFO: 2018/09/26 05:08:04.297772 weave  2.3.0
FATA: 2018/09/26 05:08:04.369497 creating dummy interface: file exists

The text was updated successfully, but these errors were encountered:

murali-reddy · 2018-09-26T06:49:40Z

thanks @nguyenquangminh0711 for reporting this issue.

Its strange your cluster ended up in this situation. As you may have seen in the code, dummy interface is created and immediately deleted. Do you happen to see any reason in the weave pod logs why dummy interface failed to get deleted?

Its corner case, but its good to handle gracefully. Please submit the PR. May be you can simplify

if err = netlink.LinkAdd(dummy); err != syscall.EEXIST {
	return errors.Wrap(err, "creating dummy interface")
}

0x2c7 · 2018-09-26T07:05:16Z

Hi @murali-reddy, thanks for your quick reply, I have just capture the logs of the previous session:

FATA: 2018/09/26 06:41:38.227346 setting dummy interface master: exchange full

It looks like it crashes at these lines, right before the dummy interface is deleted. That's why it is left to the next restart.

	if err := netlink.LinkSetMasterByIndex(dummy, b.bridge.Attrs().Index); err != nil {
		return errors.Wrap(err, "setting dummy interface master")
	}

rade · 2018-09-26T07:14:19Z

The LinkDel in that code really should be done in a defer block right after creation. Though we still need to gracefully handle the presence of the interface in any case, since a tiny window will remain where if the process is killed the interface would survive.

rade · 2018-09-26T07:23:22Z

May be you can simplify ...

Just call LinkAddIfNotExist instead.

brb · 2018-09-26T09:31:23Z

FATA: 2018/09/26 06:41:38.227346 setting dummy interface master: exchange full

This indicates that there were 1024 connected veth pairs to the weave bridge. @nguyenquangminh0711 How many pods did you run on the failing node?

0x2c7 · 2018-09-26T09:49:50Z

@brb Not much, just 41 pods on that nodes. And most of the pods are in ContainerCreating because they can not schedule the pods.

brb · 2018-09-26T09:55:30Z

@nguyenquangminh0711 Can you run ip -o link from the node?

0x2c7 · 2018-09-26T15:18:41Z

@brb When running your command, there are 1023 items. It looks like it nearly exceed the total number of 1024 in a new restart. The details of the ip links:
https://gist.github.com/nguyenquangminh0711/4dcb3926b6b8e00255e34cf9c5c7bfa6

Btw, I'm spotting that other pods in other nodes start to restart randomly, although not collapse completely like the node I'm investigating.

brb · 2018-09-27T09:07:11Z

@nguyenquangminh0711 Thanks. Can you paste kubelet logs?

0x2c7 · 2018-09-28T03:52:01Z

@brb I fetched the logs from the nodes, including kube-controller-manager, kube-proxy and kube-scheduler of the failing nodes. That's all what I can find in /var/log. Not sure that's what you are asking for? https://gist.github.com/nguyenquangminh0711/39835eedd752986a46dc8554832d2ffc

I observe that there are a lot of failures when the node tries to read from etcd. I don't whether that relates to the issue?

brb · 2018-10-01T12:21:54Z

@nguyenquangminh0711 Do you use systemd? If yes, can you run journalctl -r kubelet from the worker node with the 1023 interfaces. Otherwise, the logs should be located in /var/log/kubelet.log.

I'm interested to see whether there were any errors reported by kubelet when removing the interfaces via CNI.

0x2c7 · 2018-10-04T06:48:07Z

Hi @brb, sorry for my late response. Our logs are centrialized and not indexed properly, so I have to take some actions to extract the logs. Here is the logs of the failed node from start to killed:

https://gist.githubusercontent.com/nguyenquangminh0711/39dceff835e0d39e365c29dc1d958118/raw/fee8fc835cfc71e9fb9e32936b7d0abe8afaf275/Kubelet%2520logs

It looks like before it goes to the situation of CrashLoop, there are tons of logs like this:

2018-09-25 16:49:43 +0700 kernel: [99825.596071] weave: port 379(vethwepla47d508) entered forwarding state
2018-09-25 16:49:43 +0700 kernel: [99826.012090] weave: port 380(vethwepl7c826e6) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.706206] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.713079] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.706206] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.713079] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.706206] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:46 +0700 kernel: [99827.713079] weave: port 391(vethwepl1646259) entered forwarding state
2018-09-25 16:49:48 +0700 kernel: [99829.180077] weave: port 381(vethweplcdc27c4) entered forwarding state
2018-09-25 16:49:48 +0700 kernel: [99829.180077] weave: port 381(vethweplcdc27c4) entered forwarding state
2018-09-25 16:49:48 +0700 kernel: [99829.180077] weave: port 381(vethweplcdc27c4) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.044074] weave: port 382(vethwepl2ca4be2) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.684073] weave: port 383(vethweple32abc0) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.260076] weave: port 384(vethwepl9b74577) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.900134] weave: port 385(vethwepl4101de7) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.044074] weave: port 382(vethwepl2ca4be2) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.684073] weave: port 383(vethweple32abc0) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.260076] weave: port 384(vethwepl9b74577) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.900134] weave: port 385(vethwepl4101de7) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.044074] weave: port 382(vethwepl2ca4be2) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99830.684073] weave: port 383(vethweple32abc0) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.260076] weave: port 384(vethwepl9b74577) entered forwarding state
2018-09-25 16:49:49 +0700 kernel: [99831.900134] weave: port 385(vethwepl4101de7) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.096068] weave: port 386(vethwepl8e2a2b2) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.284074] weave: port 387(vethwepl2e24511) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.096068] weave: port 386(vethwepl8e2a2b2) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.284074] weave: port 387(vethwepl2e24511) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.096068] weave: port 386(vethwepl8e2a2b2) entered forwarding state
2018-09-25 16:49:50 +0700 kernel: [99832.284074] weave: port 387(vethwepl2e24511) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.540077] weave: port 388(vethwepl20f2d0e) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.860073] weave: port 389(vethwepl9d3a07a) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99833.020105] weave: port 390(vethweplb3ecd60) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.540077] weave: port 388(vethwepl20f2d0e) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.860073] weave: port 389(vethwepl9d3a07a) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99833.020105] weave: port 390(vethweplb3ecd60) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.540077] weave: port 388(vethwepl20f2d0e) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99832.860073] weave: port 389(vethwepl9d3a07a) entered forwarding state
2018-09-25 16:49:51 +0700 kernel: [99833.020105] weave: port 390(vethweplb3ecd60) entered forwarding state
2018-09-25 16:49:54 +0700 kernel: [99834.764564] weave: port 392(vethwepl194c895) entered forwarding state
2018-09-25 16:49:54 +0700 kernel: [99834.771644] weave: port 392(vethwepl194c895) entered forwarding state```

brb · 2018-10-08T13:42:18Z

@nguyenquangminh0711 Thanks.

I was hoping to see more weave-cni del errors which would explain why there were so many veths attached to the bridge.

Anyway, the entered forwarding state log entries at the bottom indicate that there were new containers starting. Each veth name consists of vethwepl + a container short id. Can you find those containers in your system?

iAnomaly · 2018-11-21T16:40:56Z

@brb We're experiencing this on 2.4.1 although I think we saw it on 2.4.0 as well.

From one of the affected worker nodes:

$ ip -o link|wc -l
1032

I have attached output for both ip -o link and last 15 minutes of sudo journalctl -ru kubelet from the affected node.

ip -o link.txt
kubelet.tar.gz

murali-reddy · 2018-11-22T09:43:54Z

@iAnomaly Looking at the output you shared for the ip -o link I do see dummy interface which should have been deleted.

222911: vethwedu: <BROADCAST,NOARP> mtu 8912 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 02:9a:c1:2b:07:4a brd ff:ff:ff:ff:ff:ff

Kubelet logs indicate weave is not running. I suspect weave pod went into CrashLoopBack on the node where you saw above behaviour?

Nov 21 16:28:04 ip-172-20-60-206 kubelet[10866]: E1121 16:28:04.542329   10866 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/2c535c623770ff2b389523c3f5aed82d3d76b05e30b88f1bbf1e3284a124221b: dial tcp 127.0.0.1:6784: connect:         connection refused

This is only side-effect. Root cause is dummy interface which should have been deleted after it was created by Weave. By any chance you have old Weave pod logs which can indicate why vethwedu inteface is not deleted?

murali-reddy · 2018-11-22T09:52:08Z

As pointed out dummy interface deletion should have been in the defer block in below code path.

https://github.com/weaveworks/weave/blob/v2.5.0/net/bridge.go#L332-L343

Let me know if you are still interested @nguyenquangminh0711 to raise a PR. Else I will raise a PR.

fixes #3414

brb added [component/plugin] state/investigating [component/kube] labels Oct 1, 2018

murali-reddy added a commit that referenced this issue Nov 23, 2018

handle gracefully add/delete of vethwedu dummy interface

66b05a4

fixes #3414

murali-reddy linked a pull request Nov 23, 2018 that will close this issue

handle gracefully add/delete of vethwedu dummy interface #3459

Open

murali-reddy added a commit that referenced this issue Feb 12, 2019

handle gracefully add/delete of vethwedu dummy interface

4b903b3

fixes #3414

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

0x2c7 commented Sep 26, 2018 •

edited

Loading

murali-reddy commented Sep 26, 2018

0x2c7 commented Sep 26, 2018

rade commented Sep 26, 2018

rade commented Sep 26, 2018

brb commented Sep 26, 2018

0x2c7 commented Sep 26, 2018

brb commented Sep 26, 2018

0x2c7 commented Sep 26, 2018 •

edited

Loading

brb commented Sep 27, 2018

0x2c7 commented Sep 28, 2018

brb commented Oct 1, 2018

0x2c7 commented Oct 4, 2018

brb commented Oct 8, 2018

iAnomaly commented Nov 21, 2018 •

edited

Loading

murali-reddy commented Nov 22, 2018

murali-reddy commented Nov 22, 2018 •

edited

Loading

Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

Comments

0x2c7 commented Sep 26, 2018 • edited Loading

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

murali-reddy commented Sep 26, 2018

0x2c7 commented Sep 26, 2018

rade commented Sep 26, 2018

rade commented Sep 26, 2018

brb commented Sep 26, 2018

0x2c7 commented Sep 26, 2018

brb commented Sep 26, 2018

0x2c7 commented Sep 26, 2018 • edited Loading

brb commented Sep 27, 2018

0x2c7 commented Sep 28, 2018

brb commented Oct 1, 2018

0x2c7 commented Oct 4, 2018

brb commented Oct 8, 2018

iAnomaly commented Nov 21, 2018 • edited Loading

murali-reddy commented Nov 22, 2018

murali-reddy commented Nov 22, 2018 • edited Loading

0x2c7 commented Sep 26, 2018 •

edited

Loading

0x2c7 commented Sep 26, 2018 •

edited

Loading

iAnomaly commented Nov 21, 2018 •

edited

Loading

murali-reddy commented Nov 22, 2018 •

edited

Loading