Calico cni plugin renders new node unusable

**How to categorize this issue?**

If multiple identifiers make sense you can also state the commands multiple times, e.g.

/area networking
/kind bug
/priority normal

**What happened**:
This ticket originates from [a slack discussion](https://ddm-hana.slack.com/archives/CBVQLMS6N/p1613054982167700)

When we provision a new cluster, sporadically the pods on a node are stuck in state `Container Creating`. There are events saying `Pod sandbox changed, it will be killed and re-created. ` over and over.

In the logs I can see the calico CNI plugin getting installed:
```
time="2021-02-17T10:14:40Z" level=info msg="Running as a Kubernetes pod" source="install.go:140"
time="2021-02-17T10:14:40Z" level=info msg="Installed /host/opt/cni/bin/bandwidth"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/calico"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/calico-ipam"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/flannel"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/host-local"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/install"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/loopback"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/portmap"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/tuning"
time="2021-02-17T10:14:41Z" level=info msg="Wrote Calico CNI binaries to /host/opt/cni/bin\n"
time="2021-02-17T10:14:41Z" level=info msg="CNI plugin version: v3.17.1\n"
time="2021-02-17T10:14:41Z" level=info msg="/host/secondary-bin-dir is not writeable, skipping"
time="2021-02-17T10:14:41Z" level=info msg="Using CNI config template from CNI_NETWORK_CONFIG environment variable." source="install.go:319"
time="2021-02-17T10:14:41Z" level=info msg="Created /host/etc/cni/net.d/10-calico.conflist"
time="2021-02-17T10:14:41Z" level=info msg="Done configuring CNI.  Sleep= false"
```
But according to the slack discussion something must have removed it later so that the error appears.

**What you expected to happen**:
A regular node being provisioned where pods can run.

**How to reproduce it (as minimally and precisely as possible)**:
Unfortunately, I do not know how to reproduce this. It is sporadic

**Anything else we need to know?**:

**Environment**:

- Gardener version (if relevant):
- Extension version:
- Kubernetes version (use `kubectl version`): 1.17.14
- Cloud provider or hardware configuration: Azure
- Others:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calico cni plugin renders new node unusable #70

lenhard
openedon Feb 18, 2021

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calico cni plugin renders new node unusable #70

Description

lenhardopenedon Feb 18, 2021

Metadata