Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solves docker cruntime notavail bug for --driver=none #16936

Closed
wants to merge 2 commits into from

Conversation

x7upLime
Copy link
Contributor

@x7upLime x7upLime commented Jul 23, 2023

Solves the docker cruntime not avail bug, that occurs whenever using none driver with any container runtime other than docker.

fixes #5549
fixes #10908
fixes #16722

minikube start --driver=none --container-runtime=containerd

ubuntu@minikube:~/minikube$ sudo -E ./out/minikube start --driver=none --container-runtime=containerd
😄  minikube v1.31.1 on Ubuntu 22.10 (kvm/amd64)
✨  Using the none driver based on user configuration
❗  Using the 'containerd' runtime with the 'none' driver is an untested configuration!
👍  Starting control plane node minikube in cluster minikube
🤹  Running on localhost (CPUs=4, Memory=3905MB, Disk=4781MB) ...
ℹ️  OS release is Ubuntu 22.10
📦  Preparing Kubernetes v1.27.3 on containerd 1.6.12-0ubuntu1 ...
    ▪ kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
    > kubeadm.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubectl.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubelet.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubelet:  101.24 MiB / 101.24 MiB [----------] 100.00% 61.89 MiB p/s 1.8s
    > kubectl:  46.98 MiB / 46.98 MiB [------------] 100.00% 22.64 MiB p/s 2.3s
    > kubeadm:  45.93 MiB / 45.93 MiB [--------------] 100.00% 1.68 MiB p/s 28s
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🤹  Configuring local host environment ...

❗  The 'none' driver is designed for experts who need to integrate with an existing VM
💡  Most users should use the newer 'docker' driver instead, which does not require root!
📘  For more information, see: https://minikube.sigs.k8s.io/docs/reference/drivers/none/

❗  kubectl and minikube configuration will be stored in /home/ubuntu
❗  To use kubectl or minikube commands as your own user, you may need to relocate them. For example, to overwrite your own settings, run:

    ▪ sudo mv /home/ubuntu/.kube /home/ubuntu/.minikube $HOME
    ▪ sudo chown -R $USER $HOME/.kube $HOME/.minikube

💡  This can also be done automatically by setting the env var CHANGE_MINIKUBE_NONE_USER=true
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: default-storageclass, storage-provisioner
💡  kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A'
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

which docker

ubuntu@minikube:~/minikube$ which docker

which containerd

ubuntu@minikube:~/minikube$ which containerd
/usr/bin/containerd

ubuntu@minikube:~/minikube$ sudo -E ./out/minikube kubectl -- get po -A

Needs sudo sysctl fs.protected_regular=0.
Check: HOST_JUJU_LOCK_PERMISSION here

ubuntu@minikube:~/minikube$ sudo -E ./out/minikube kubectl -- get po -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-5d78c9869d-krprh           1/1     Running   0          5m59s
kube-system   etcd-minikube                      1/1     Running   0          6m11s
kube-system   kube-apiserver-minikube            1/1     Running   0          6m11s
kube-system   kube-controller-manager-minikube   1/1     Running   0          6m11s
kube-system   kube-proxy-mh7bh                   1/1     Running   0          5m58s
kube-system   kube-scheduler-minikube            1/1     Running   0          6m11s
kube-system   storage-provisioner                1/1     Running   0          6m10s

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 23, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @x7upLime. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 23, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: x7upLime
Once this PR has been reviewed and has the lgtm label, please assign afbjorklund for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 23, 2023
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

// Libraries shouldn't panic, but there is no way for drivers to return error :(
if err != nil {
klog.Fatalf("unable to create container runtime: %v", err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is just no point in initializing d.exec, d.runtime at this phase, since init is only called before the json marshal/unmarshal mechanism of libmachine, i.e. it would not survive that.

d.exec = runner

fmt.Printf("We've put a command runner inside Driver in PrecreateCheck: %#v\n\n\n", d.exec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we initialize runner/runtime here, since it's the first method from the driver, that gets called when creating the machine

@@ -125,6 +131,9 @@ func (d *Driver) GetURL() (string, error) {

// GetState returns the state that the host is in (running, stopped, etc)
func (d *Driver) GetState() (state.State, error) {
runner := command.NewExecRunner(true)
d.exec = runner

Copy link
Contributor Author

@x7upLime x7upLime Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also initialize a runner here, since it is expected after provisioning phase, for enabling addons

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 26, 2023
Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz paste the output of minikube start with the followinfg conditions

1- no docker in path
$ which docker
2- have containerd installed
3- minikube start --driver=none --container-runtime=containerd
4- kubecltl get pods -A

@x7upLime
Copy link
Contributor Author

Reminder for myself:
as suggested by @medyagh, I will try to put the runtime and exec fields of the none driver structs inside the BaseDriver or CommonDriver to see where that gets us for this and maybe other issues, in a separate pr.

@x7upLime
Copy link
Contributor Author

Unfortunately putting the runtime/exec fields inside the CommonDriver still won't get those past the marshal/unmarshal stage in the driver lifecycle:

$ sudo -E ./out/minikube start --driver=none --container-runtime=containerd

😄  minikube v1.31.1 on Ubuntu 22.10 (kvm/amd64)
✨  Using the none driver based on user configuration
❗  Using the 'containerd' runtime with the 'none' driver is an untested configuration!
👍  Starting control plane node minikube in cluster minikube
🤹  Running on localhost (CPUs=4, Memory=3905MB, Disk=4781MB) ...
🤦  StartHost failed, but will try again: creating host: create: precreate: exec: "docker": executable file not found in $PATH
😿  Failed to start none bare metal machine. Running "minikube delete" may fix it: boot lock: unable to open /tmp/juju-mk827fcceb822ddd434756f789a161c4dd798db6: permission denied

❌  Exiting due to HOST_JUJU_LOCK_PERMISSION: Failed to start host: boot lock: unable to open /tmp/juju-mk827fcceb822ddd434756f789a161c4dd798db6: permission denied
💡  Suggestion: Run 'sudo sysctl fs.protected_regular=0', or try a driver which does not require root, such as '--driver=docker'
🍿  Related issue: https://github.com/kubernetes/minikube/issues/6391

The patch that I used:

diff --git a/pkg/drivers/common.go b/pkg/drivers/common.go
index d7e3d3c44..da55b9726 100644
--- a/pkg/drivers/common.go
+++ b/pkg/drivers/common.go
@@ -34,6 +34,8 @@ import (
 	"github.com/pkg/errors"
 
 	"k8s.io/klog/v2"
+	"k8s.io/minikube/pkg/minikube/command"
+	"k8s.io/minikube/pkg/minikube/cruntime"
 	"k8s.io/minikube/pkg/util"
 )
 
@@ -86,7 +88,10 @@ func CreateRawDisk(diskPath string, sizeMB int) error {
 }
 
 // CommonDriver is the common driver base class
-type CommonDriver struct{}
+type CommonDriver struct {
+	Runtime cruntime.Manager
+	Exec    command.Runner
+}
 
 // GetCreateFlags is not implemented yet
 func (d *CommonDriver) GetCreateFlags() []mcnflag.Flag {
diff --git a/pkg/drivers/none/none.go b/pkg/drivers/none/none.go
index 060150c26..55b0ae2ce 100644
--- a/pkg/drivers/none/none.go
+++ b/pkg/drivers/none/none.go
@@ -48,8 +48,6 @@ type Driver struct {
 	*drivers.BaseDriver
 	*pkgdrivers.CommonDriver
 	URL        string
-	runtime    cruntime.Manager
-	exec       command.Runner
 	NodeConfig Config
 }
 
@@ -62,28 +60,29 @@ type Config struct {
 
 // NewDriver returns a fully configured None driver
 func NewDriver(c Config) *Driver {
+	runner := command.NewExecRunner(true)
+	runtime, err := cruntime.New(cruntime.Config{Type: c.ContainerRuntime, Runner: runner})
+	// Libraries shouldn't panic, but there is no way for drivers to return error :(
+	if err != nil {
+		klog.Fatalf("unable to create container runtime: %v", err)
+	}
+
 	return &Driver{
 		BaseDriver: &drivers.BaseDriver{
 			MachineName: c.MachineName,
 			StorePath:   c.StorePath,
 		},
 		NodeConfig: c,
+		CommonDriver: &pkgdrivers.CommonDriver{
+			Runtime: runtime,
+			Exec:    runner,
+		},
 	}
 }
 
 // PreCreateCheck checks for correct privileges and dependencies
 func (d *Driver) PreCreateCheck() error {
-	runner := command.NewExecRunner(true)
-	runtime, err := cruntime.New(cruntime.Config{Type: d.NodeConfig.ContainerRuntime, Runner: runner})
-	// Libraries shouldn't panic, but there is no way for drivers to return error :(
-	if err != nil {
-		klog.Fatalf("unable to create container runtime: %v", err)
-	}
-
-	d.runtime = runtime
-	d.exec = runner
-
-	return d.runtime.Available()
+	return d.Runtime.Available()
 }
 
 // Create a host using the driver's config
@@ -129,8 +128,8 @@ func (d *Driver) GetURL() (string, error) {
 
 // GetState returns the state that the host is in (running, stopped, etc)
 func (d *Driver) GetState() (state.State, error) {
-	runner := command.NewExecRunner(true)
-	d.exec = runner
+	// runner := command.NewExecRunner(true)
+	// d.Exec = runner
 
 	hostname, port, err := kubeconfig.Endpoint(d.BaseDriver.MachineName)
 	if err != nil {
@@ -139,7 +138,7 @@ func (d *Driver) GetState() (state.State, error) {
 	}
 
 	// Confusing logic, as libmachine.Stop will loop until the state == Stopped
-	ast, err := kverify.APIServerStatus(d.exec, hostname, port)
+	ast, err := kverify.APIServerStatus(d.Exec, hostname, port)
 	if err != nil {
 		return ast, err
 	}
@@ -149,17 +148,17 @@ func (d *Driver) GetState() (state.State, error) {
 		return state.Running, nil
 	}
 
-	return kverify.ServiceStatus(d.exec, "kubelet"), nil
+	return kverify.ServiceStatus(d.Exec, "kubelet"), nil
 }
 
 // Kill stops a host forcefully, including any containers that we are managing.
 func (d *Driver) Kill() error {
-	if err := sysinit.New(d.exec).ForceStop("kubelet"); err != nil {
+	if err := sysinit.New(d.Exec).ForceStop("kubelet"); err != nil {
 		klog.Warningf("couldn't force stop kubelet. will continue with kill anyways: %v", err)
 	}
 
 	// First try to gracefully stop containers
-	containers, err := d.runtime.ListContainers(cruntime.ListContainersOptions{})
+	containers, err := d.Runtime.ListContainers(cruntime.ListContainersOptions{})
 	if err != nil {
 		return errors.Wrap(err, "containers")
 	}
@@ -167,18 +166,18 @@ func (d *Driver) Kill() error {
 		return nil
 	}
 	// Try to be graceful before sending SIGKILL everywhere.
-	if err := d.runtime.StopContainers(containers); err != nil {
+	if err := d.Runtime.StopContainers(containers); err != nil {
 		return errors.Wrap(err, "stop")
 	}
 
-	containers, err = d.runtime.ListContainers(cruntime.ListContainersOptions{})
+	containers, err = d.Runtime.ListContainers(cruntime.ListContainersOptions{})
 	if err != nil {
 		return errors.Wrap(err, "containers")
 	}
 	if len(containers) == 0 {
 		return nil
 	}
-	if err := d.runtime.KillContainers(containers); err != nil {
+	if err := d.Runtime.KillContainers(containers); err != nil {
 		return errors.Wrap(err, "kill")
 	}
 	return nil
@@ -191,7 +190,7 @@ func (d *Driver) Remove() error {
 	}
 	klog.Infof("Removing: %s", cleanupPaths)
 	args := append([]string{"rm", "-rf"}, cleanupPaths...)
-	if _, err := d.exec.RunCmd(exec.Command("sudo", args...)); err != nil {
+	if _, err := d.Exec.RunCmd(exec.Command("sudo", args...)); err != nil {
 		klog.Errorf("cleanup incomplete: %v", err)
 	}
 	return nil
@@ -199,7 +198,7 @@ func (d *Driver) Remove() error {
 
 // Restart a host
 func (d *Driver) Restart() error {
-	return sysinit.New(d.exec).Restart("kubelet")
+	return sysinit.New(d.Exec).Restart("kubelet")
 }
 
 // Start a host
@@ -218,18 +217,18 @@ func (d *Driver) Start() error {
 
 // Stop a host gracefully, including any containers that we are managing.
 func (d *Driver) Stop() error {
-	if err := sysinit.New(d.exec).Stop("kubelet"); err != nil {
+	if err := sysinit.New(d.Exec).Stop("kubelet"); err != nil {
 		klog.Warningf("couldn't stop kubelet. will continue with stop anyways: %v", err)
-		if err := sysinit.New(d.exec).ForceStop("kubelet"); err != nil {
+		if err := sysinit.New(d.Exec).ForceStop("kubelet"); err != nil {
 			klog.Warningf("couldn't force stop kubelet. will continue with stop anyways: %v", err)
 		}
 	}
-	containers, err := d.runtime.ListContainers(cruntime.ListContainersOptions{})
+	containers, err := d.Runtime.ListContainers(cruntime.ListContainersOptions{})
 	if err != nil {
 		return errors.Wrap(err, "containers")
 	}
 	if len(containers) > 0 {
-		if err := d.runtime.StopContainers(containers); err != nil {
+		if err := d.Runtime.StopContainers(containers); err != nil {
 			return errors.Wrap(err, "stop containers")
 		}
 	}

@spowelljr
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 24, 2023
@spowelljr
Copy link
Member

/test pull-minikube-build

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16936) |
+----------------+----------+---------------------+
| minikube start | 50.5s    | 49.9s               |
| enable ingress | 28.2s    | 28.4s               |
+----------------+----------+---------------------+

Times for minikube start: 48.8s 53.0s 51.4s 50.8s 48.7s
Times for minikube (PR 16936) start: 47.5s 50.8s 49.9s 50.3s 51.0s

Times for minikube ingress: 29.0s 28.0s 27.6s 28.6s 27.6s
Times for minikube (PR 16936) ingress: 28.6s 27.2s 28.2s 28.6s 29.6s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16936) |
+----------------+----------+---------------------+
| minikube start | 23.2s    | 23.6s               |
| enable ingress | 20.8s    | 21.4s               |
+----------------+----------+---------------------+

Times for minikube start: 22.0s 24.1s 21.4s 23.8s 24.7s
Times for minikube (PR 16936) start: 22.7s 24.8s 21.5s 25.1s 23.7s

Times for minikube ingress: 20.8s 20.8s 20.8s 20.8s 20.8s
Times for minikube (PR 16936) ingress: 20.8s 20.8s 21.8s 20.8s 22.8s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16936) |
+----------------+----------+---------------------+
| minikube start | 22.0s    | 21.8s               |
| enable ingress | 29.1s    | 26.5s               |
+----------------+----------+---------------------+

Times for minikube start: 20.3s 20.2s 22.6s 23.1s 23.8s
Times for minikube (PR 16936) start: 23.7s 19.4s 23.6s 22.4s 20.0s

Times for minikube ingress: 31.3s 31.3s 19.3s 32.3s 31.3s
Times for minikube (PR 16936) ingress: 31.3s 19.3s 31.3s 19.3s 31.3s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux TestMultiNode/serial/StartAfterStop (gopogh) 0.00 (chart)
none_Linux TestAddons/StoppedEnableDisable (gopogh) 0.00 (chart)
none_Linux TestCertExpiration (gopogh) 0.00 (chart)
none_Linux TestErrorJSONOutput (gopogh) 0.00 (chart)
none_Linux TestJSONOutput/start/parallel/DistinctCurrentSteps (gopogh) 0.00 (chart)
none_Linux TestJSONOutput/start/parallel/IncreasingCurrentSteps (gopogh) 0.00 (chart)
none_Linux TestJSONOutput/stop/Command (gopogh) 0.00 (chart)
none_Linux TestKubernetesUpgrade (gopogh) 0.00 (chart)
none_Linux TestPause/serial/DeletePaused (gopogh) 0.00 (chart)
none_Linux TestRunningBinaryUpgrade (gopogh) 0.00 (chart)
none_Linux TestStoppedBinaryUpgrade/MinikubeLogs (gopogh) 0.00 (chart)
none_Linux TestStoppedBinaryUpgrade/Upgrade (gopogh) 0.00 (chart)
QEMU_macOS TestFunctional/parallel/CertSync (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/DashboardCmd (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/DockerEnv/bash (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageBuild (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageListJson (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageListTable (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/NodeLabels (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/NonActiveRuntimeDisabled (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/PersistentVolumeClaim (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/DeployApp (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/Format (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/HTTPS (gopogh) 2.04 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/JSONOutput (gopogh) 2.04 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

Copy link
Member

@spowelljr spowelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@x7upLime It looks like because you removed setting exec and runtime in NewDriver they're nil and can run into nil panics

@spowelljr spowelljr added the do-not-merge/failing-test Indicates that a PR should not merge because it has a failing test label Aug 25, 2023
@x7upLime x7upLime closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/failing-test Indicates that a PR should not merge because it has a failing test ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
6 participants