Skip to content

Calico cni plugin renders new node unusable #70

Open

Description

How to categorize this issue?

If multiple identifiers make sense you can also state the commands multiple times, e.g.

/area networking
/kind bug
/priority normal

What happened:
This ticket originates from a slack discussion

When we provision a new cluster, sporadically the pods on a node are stuck in state Container Creating. There are events saying Pod sandbox changed, it will be killed and re-created. over and over.

In the logs I can see the calico CNI plugin getting installed:

time="2021-02-17T10:14:40Z" level=info msg="Running as a Kubernetes pod" source="install.go:140"
time="2021-02-17T10:14:40Z" level=info msg="Installed /host/opt/cni/bin/bandwidth"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/calico"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/calico-ipam"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/flannel"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/host-local"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/install"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/loopback"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/portmap"
time="2021-02-17T10:14:41Z" level=info msg="Installed /host/opt/cni/bin/tuning"
time="2021-02-17T10:14:41Z" level=info msg="Wrote Calico CNI binaries to /host/opt/cni/bin\n"
time="2021-02-17T10:14:41Z" level=info msg="CNI plugin version: v3.17.1\n"
time="2021-02-17T10:14:41Z" level=info msg="/host/secondary-bin-dir is not writeable, skipping"
time="2021-02-17T10:14:41Z" level=info msg="Using CNI config template from CNI_NETWORK_CONFIG environment variable." source="install.go:319"
time="2021-02-17T10:14:41Z" level=info msg="Created /host/etc/cni/net.d/10-calico.conflist"
time="2021-02-17T10:14:41Z" level=info msg="Done configuring CNI.  Sleep= false"

But according to the slack discussion something must have removed it later so that the error appears.

What you expected to happen:
A regular node being provisioned where pods can run.

How to reproduce it (as minimally and precisely as possible):
Unfortunately, I do not know how to reproduce this. It is sporadic

Anything else we need to know?:

Environment:

  • Gardener version (if relevant):
  • Extension version:
  • Kubernetes version (use kubectl version): 1.17.14
  • Cloud provider or hardware configuration: Azure
  • Others:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

area/networkingNetworking relatedkind/bugBuglifecycle/rottenNobody worked on this for 12 months (final aging stage)priority/3Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions