Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve how cni and cruntimes work together #11184

Closed
prezha opened this issue Apr 23, 2021 · 13 comments · Fixed by #11185
Closed

improve how cni and cruntimes work together #11184

prezha opened this issue Apr 23, 2021 · 13 comments · Fixed by #11185
Assignees
Labels
co/runtime/containerd co/runtime/crio CRIO related issues co/runtime/docker Issues specific to a docker runtime

Comments

@prezha
Copy link
Contributor

prezha commented Apr 23, 2021

crio is the only container runtime atm that allows specifying cni network that should be used (via 'cni_default_network' config param) and, unlike containerd or docker, it doesn't need custom kubelet's '--cni-conf-dir' flag to avoid conflicting CNI configs in default /etc/cni/net.d, so we can used that to improve how cni & cruntime work together (especially for multinode), while also avoiding later runtime reconfiguration and restart

/assign

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 24, 2021

I'm not sure there is anything that minikube can do about this, Kubernetes says that installing a CNI means that it will be used. It doesn't support having software configured that is not supposed to be used, in /etc/crictl.yaml and in /etc/cni/net.d.

So as soon as you add cri-o or cri-containerd-cni, things break. The main reason that docker-ce and containerd.io "works", is that they don't add any CRI configuration to their packages. This also means that they don't work (without dockershim).


I think the only way forward is to remove the configuration from the cri-o and podman packaging.

cri-o

/etc
/etc/cni
/etc/cni/net.d
/etc/cni/net.d/100-crio-bridge.conf
/etc/cni/net.d/200-loopback.conf
/etc/crictl.yaml
/etc/crio
/etc/crio/crio.conf
/etc/crio/crio.conf.d
/etc/crio/crio.conf.d/01-crio-runc.conf
/etc/default
/etc/default/crio

podman

/etc
/etc/cni
/etc/cni/net.d
/etc/cni/net.d/87-podman-bridge.conflist

containerd.io

/etc
/etc/containerd
/etc/containerd/config.toml

cri-containerd-cni

etc/
etc/systemd/
etc/systemd/system/
etc/systemd/system/containerd.service
etc/crictl.yaml
etc/cni/
etc/cni/net.d/
etc/cni/net.d/10-containerd-net.conflist

PS. That containerd systemd unit is just a packaging bug, it was supposed to go in /usr/local/lib/systemd.

But the kubeadm documentation doesn't install containerd from containerd.io, they install it from docker.com
And that means that they have to set up their configuration for CRI, but that they get the unit in /usr/lib/systemd.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

Note that containerd.io (docker-containerd) ships with a configurating to disable CRI by default:

disabled_plugins = ["cri"]

This is why kubeadm has to-reconfigure it (re-enable the plugin), as part of the installation instructions.
containerd config default | sudo tee /etc/containerd/config.toml

disabled_plugins = []

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 24, 2021

You have already seen how the kubeadm autodetection crashes, when there is more than onetime running:

Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock

There is a hack to skip reporting /run/containerd/containerd.sock when /var/run/docker.sock is present.

        foundCRISockets := []string{}
        knownCRISockets := []string{
                // Docker and containerd sockets are special cased below, hence not to be included here
                "/var/run/crio/crio.sock",
        }

        if isSocket(dockerSocket) {
                // the path in dockerSocket is not CRI compatible, hence we should replace it with a CRI compatible socket
                foundCRISockets = append(foundCRISockets, constants.DefaultDockerCRISocket)
        } else if isSocket(containerdSocket) {
                // Docker 18.09 gets bundled together with containerd, thus having both dockerSocket and containerdSocket present.
                // For compatibility reasons, we use the containerd socket only if Docker is not detected.
                foundCRISockets = append(foundCRISockets, containerdSocket)
        }

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 24, 2021

Once we start running CRI for docker (with Kubernetes 1.22), it means that we will have to use criSocket for it.

Currently cri-dockerd defaults to running on /var/run/dockershim.sock to avoid users having to configure it... 🤷

https://github.com/dims/cri-dockerd

CRI that connects to Docker Daemon

Usage:
  DockerCRI [flags]

Flags:
      --alsologtostderr                         log to standard error as well as files
      --cni-bin-dir string                      <Warning: Alpha feature> A comma-separated list of full paths of directories in which to search for CNI plugin binaries.
      --cni-cache-dir string                    <Warning: Alpha feature> The full path of the directory in which CNI should store cache files.
      --cni-conf-dir string                     <Warning: Alpha feature> The full path of the directory in which to search for CNI config files
      --container-runtime-endpoint string       The endpoint of remote runtime service. Currently unix socket and tcp endpoints are supported on Linux, while npipe and tcp endpoints are supported on windows.  Examples:'unix:///var/run/dockershim.sock', 'npipe:////./pipe/dockershim' (default "unix:///var/run/dockershim.sock")
      --docker-endpoint string                  Use this for the docker endpoint to communicate with. (default "unix:///var/run/docker.sock")
      --dockershim-root-directory string        Path to the dockershim root directory. (default "/var/lib/dockershim")
  -h, --help                                    help for DockerCRI

But you could use --container-runtime-endpoint=unix:///var/run/cri-dockerd/cri-dockerd.sock or something.

@afbjorklund afbjorklund added co/runtime/containerd co/runtime/crio CRIO related issues co/runtime/docker Issues specific to a docker runtime labels Apr 24, 2021
@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

i think that the fact that If there are multiple CNI configuration files in the directory, the kubelet uses the configuration file that comes first by name in lexicographic order. (ref: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#cni) still holds just reflects its lower priority and lack of time for (properly) solving this upstream(s), which can be understandable to some extent: if you are using kubeadm to setup k8s cluster, you would need to fulfil some strict prereqs and follow specific instructions for the single 'approved' combination of os/cr/cni/... in order for it to work

now, minikube tries to take away that burden off from the user to still be able to have the 'local kubernetes experience' and thus trying to figure out what it can do best with all different combinations that users have on their development machines or they might want to try out (eg, cgroupfs comes to mind - ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubeadm-blocks-waiting-for-control-plane-during-installation, https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/#using-the-cgroupfs-driver, https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers, etc.), while also keeping itself fast and as small as possible (which is all a bit conflicting requirements, given the complexity minikube tries to solve here)

i would expect that this specific issue will eventually be addressed upstream and that workarounds such as this one would not be necessary, and in the meantime - we can keep trying to provide the best possible experience for minikube users :)

what gives hope is that this specific 'issue' of conflicting cohabitating cni configurations was already elegantly (better yet - properly) solved by crio (and podman) (ref: containers/podman#2370 and cri-o/cri-o#2121)

or, if i understood what you are suggesting above, we might also decide to take a different route and create separate 'environments' for those different combinations, but then it would probably impose some other challenges - eg, the overall 'size' of such solution, potential additional slowdowns introduced, complexities of maintaining those different combos, etc.

all in all, in that meantime, the pr connected to this issue is an improvement over the previous initial implementation in that it centralised where and how the 'necessary workaround' is triggered (and can be more easily changed/expanded/switched-off if needed - in cni.configureCNI), changes are adapted to a specific cruntime & cni used and they happen early enough so that additional restarts (and thus slowdowns) are avoided, etc.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 25, 2021

The "lexicographic order" thing doesn't work, for instance the kindnet CNI is bootstrapped from a pod. So it expects Kubernetes to be crash-looping, and any installed CNI will take precedence (over a not-yet-existing one) and prevent it from working correctly.

So we need to ask podman to stay away from using /etc/cni/net.d (but use a different directory*), and to not install any container runtimes that we don't expect to be using. We already delete /etc/crictl.yaml, but need to delete files in /etc/cni/net.d too.

* /etc/containers/containers.conf

[network]

# Path to directory where CNI plugin binaries are located.
#
# cni_plugin_dirs = ["/usr/libexec/cni"]

# The network name of the default CNI network to attach pods to.
# default_network = "podman"

# Path to the directory where CNI configuration files are located.
#
# network_config_dir = "/etc/cni/net.d/"

Same with flannel, or what have you.

      volumes:
      - name: cni
        hostPath:
          path: /etc/cni/net.d

https://github.com/flannel-io/flannel/blob/master/Documentation/kube-flannel.yml

@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

that's why in this 'workaround' we are pointing "/etc/cni/net.mk" to be used for kindnet cni instead so that kindnet pod gets initialised correctly while other 'relevant' pods (like coredns) stay in the pending state waiting for kindnet to become operational?

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 25, 2021

Renaming the cni directory (#10985) or installing fake cni (#10384) are strange workarounds, I think. :-P

Would prefer something more similar to upstream Kubernetes, especially if CRI and CNI are now mandatory ?

@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

completely agree, and atm only in the case of crio cr it only takes 'cni_default_network' flag (so no need for any kind of workaround)

@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

if i remember correctly, cilium, for example, does not need a workaround either as it restarts 'relevant' pods (incl coredns) after it's deployed - that is a decent solution i think, but not all cni used it

what do you suggest?

@afbjorklund
Copy link
Collaborator

in the case of crio cr it only takes 'cni_default_network' flag (so no need for any kind of workaround)

CRI-O is perfectly fine to install /etc/crictl.yaml (for CRI) and /etc/cni/net.d/100-crio-bridge.conf (for CNI) -
but if and only if it is actually the selected runtime. Which means that it can't install them with the binary.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 25, 2021

what do you suggest?

I think we need to leave /etc/crictl.yaml and /etc/cni/net.d empty, by removing them from the packaging.
And then populate them as part of the startup procedure, when we know which CRI and which CNI ?

Also, the long-term plan is to install only the container runtime that is actually going to be used: #9989

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 25, 2021

Besides changing the Podman configuration, I also need to change the BuildKit configuration.

  --containerd-worker-net value              worker network type (auto, cni or host) (default: "auto")
  --containerd-cni-config-path value         path of cni config file (default: "/etc/buildkit/cni.json")
  --containerd-cni-binary-dir value          path of cni binary files (default: "/opt/cni/bin")

The idea is to have them use a separate bridge for building, as opposed to cri-o or containerd ?

@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

that makes sense too - do you mind please creating a pr that would check if it solves this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/runtime/containerd co/runtime/crio CRIO related issues co/runtime/docker Issues specific to a docker runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants