Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtualbox: minikube ip changes after laptop suspend/resume #9479

Open
elliott-davis opened this issue Oct 16, 2020 · 18 comments
Open

virtualbox: minikube ip changes after laptop suspend/resume #9479

elliott-davis opened this issue Oct 16, 2020 · 18 comments
Labels
area/networking networking issues co/virtualbox kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@elliott-davis
Copy link

To start minikube I am running

minikube start --driver=virtualbox --cpus=8 --memory=10240 --disk-size=40g --host-only-cidr=100.121.20.1/24 \
--nat-nic-type=Am79C973 --host-only-nic-type=Am79C973 --disable-driver-mounts=true \
--docker-opt max-concurrent-downloads=10

This starts minikube in a desired state. I can then stop/start it all day long.

The issue seems to arise when my laptop suspends itself after closing the lid. When it resumes, I can run minikube status and after some time I get:

minikube
type: Control Plane
host: running
kubelet: Running
apiserver: Stopped
kubeconfig: Configured

In order to get the api server running again, I execute the exact same minikube start command above and get:

:smile:  minikube v1.14.0 on Ubuntu 18.04
    ▪ MINIKUBE_ACTIVE_DOCKERD=minikube
:sparkles:  Using the virtualbox driver based on existing profile
:+1:  Starting control plane node minikube in cluster minikube
:runner:  Updating the running virtualbox "minikube" VM ...
:x:  minikube is unable to connect to the VM: dial tcp 172.17.0.1:22: i/o timeout
	This is likely due to one of two reasons:
	- VPN or firewall interference
	- virtualbox network configuration issue
	Suggested workarounds:
	- Disable your local VPN or firewall software
	- Configure your local VPN or firewall to allow access to 172.17.0.1
	- Restart or reinstall virtualbox
	- Use an alternative --vm-driver
	- Use --force to override this connectivity check
:x:  Exiting due to GUEST_PROVISION: Failed to validate network: dial tcp 172.17.0.1:22: i/o timeout
:crying_cat_face:  If the above advice does not help, please let us know: 
:point_right:  https://github.com/kubernetes/minikube/issues/new/choose

The IP address appears to have changed to 172.17.0.1, which is confusing. Since this IP is not routable given my Cisco AnyConnect settings, everything fails.

I'm happy to provide any additional info as needed.

@afbjorklund
Copy link
Collaborator

Looks like it uses the wrong interface, since 172.17.0.1 is normally docker0

@afbjorklund afbjorklund added co/virtualbox kind/bug Categorizes issue or PR as related to a bug. labels Oct 16, 2020
@elliott-davis
Copy link
Author

I'm not sure how it would get that address. I have docker0 on my machine configured to use 100.121.23.2/26.

cscotun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1320
        inet 10.33.66.190  netmask 255.255.224.0  destination 10.33.66.190
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 177302  bytes 142271252 (142.2 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 89390  bytes 7855599 (7.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 100.121.23.2  netmask 255.255.255.192  broadcast 100.121.23.63
        ether 02:42:03:60:e7:70  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 870492  bytes 3968489704 (3.9 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 870492  bytes 3968489704 (3.9 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
vboxnet1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.121.20.1  netmask 255.255.255.0  broadcast 100.121.20.255
        ether 0a:00:27:00:00:01  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 23245  bytes 1254903 (1.2 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
wlp0s20f3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.129  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::6536:9531:fdb4:6a6d  prefixlen 64  scopeid 0x20<link>
        ether 3c:f0:11:98:77:0e  txqueuelen 1000  (Ethernet)
        RX packets 35151297  bytes 49313676791 (49.3 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14519370  bytes 3037825689 (3.0 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

@afbjorklund
Copy link
Collaborator

Not sure either, will try with your settings. Just that it happens to be the default docker network.

@elliott-davis
Copy link
Author

It is worth noting that if I run minikube stop then restart it with the above command, everything works as expected.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Oct 16, 2020

I suppose you could try minikube ip with some extra verbosity, to see why it fails to detect the IP after restart.

EDIT: Hmm, maybe that only reads the config though

@afbjorklund
Copy link
Collaborator

The virtualbox driver is looking for the MAC of the host-only adapter, and then tries to find that MAC in "ip addr show"

Something like:

$ VBoxManage showvminfo minikube --machinereadable | grep ^hostonlyadapter
hostonlyadapter2="vboxnet0"
$ VBoxManage showvminfo minikube --machinereadable | grep ^macaddress2
macaddress2="08002751D555"
$ minikube ssh ip addr show | grep -A1 -i "08:00:27:51:D5:55"
    link/ether 08:00:27:51:d5:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.99.188/24 brd 192.168.99.255 scope global dynamic eth1

The actual go code in: https://github.com/docker/machine/blob/v0.16.2/drivers/virtualbox/virtualbox.go#L710_L802

@afbjorklund afbjorklund added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Oct 16, 2020
@elliott-davis
Copy link
Author

Based on our discussion from slack:
It appears that the laptop suspending is causing systemd-networkd to drop the carrier and not re-allocate an ip when the host resumes.

Oct 16 16:02:21 minikube systemd-networkd[2176]: eth1: DHCPv4 address 100.121.20.128/24
Oct 16 16:02:21 minikube systemd-networkd[2176]: eth1: DHCP: No gateway received from DHCP server.
Oct 16 16:02:21 minikube systemd-networkd[2176]: eth1: Configured
Oct 16 16:02:57 minikube systemd-networkd[2176]: vethe75f291: Gained carrier
Oct 16 16:02:57 minikube systemd-networkd[2176]: docker0: Gained carrier
Oct 16 16:07:21 minikube systemd-networkd[2176]: eth1: DHCP: No gateway received from DHCP server.
Oct 16 16:09:07 minikube systemd-networkd[2176]: eth1: Lost carrier
Oct 16 16:09:07 minikube systemd-networkd[2176]: [[0;1;39m[[0;1;31m[[0;1;39meth1: DHCP lease lost[[0m
Oct 16 16:10:20 minikube systemd-networkd[2176]: eth1: Gained carrier```

@afbjorklund
Copy link
Collaborator

afbjorklund commented Oct 17, 2020

So there are two bugs here:

  1. The virtualbox driver GetIP returns the docker0 address, instead of returning an error when getting the eth1 address
  2. It seems like the systemd-networkd DHCP server never configures the interface again, so it doesn't get an address

Note that even if the bug is fixed, and it starts waiting for a "new" address - it still takes VirtualBox 1m13s to provide one!
(the "No gateway received from DHCP server." is just complaining about the response, we have a workaround for that)

@afbjorklund
Copy link
Collaborator

The changes between v240..v244 are mostly related to wifi, but the logic does change a bit too:

@@ -3452,13 +3363,23 @@ static int link_carrier_gained(Link *link) {
 
         assert(link);
 
-        if (!IN_SET(link->state, LINK_STATE_PENDING, LINK_STATE_UNMANAGED, LINK_STATE_FAILED)) {
+        r = wifi_get_info(link);
+        if (r < 0)
+                return r;
+        if (r > 0) {
+                r = link_reconfigure(link, false);
+                if (r < 0)
+                        return r;
+        }
+
+        if (IN_SET(link->state, LINK_STATE_CONFIGURING, LINK_STATE_CONFIGURED)) {
                 r = link_acquire_conf(link);
                 if (r < 0) {
                         link_enter_failed(link);
                         return r;
                 }
 
+                link_set_state(link, LINK_STATE_CONFIGURING);
                 r = link_request_set_addresses(link);
                 if (r < 0)
                         return r;

Could be worth testing with the new minikube ISO (that has systemd 244.5), to see if it helps anything ?

You can see the current state of systemd-networkd like so:

$ minikube ssh -- networkctl --no-pager list
IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 eth0             ether              routable    configured
  3 eth1             ether              routable    configured
  4 sit0             sit                off         unmanaged 
  5 docker0          bridge             routable    unmanaged 
  7 veth0aebfa1      ether              carrier     unmanaged 

6 links listed.

The eth0 is the nat, and the eth1 is the hostonly (see #4938)

@afbjorklund
Copy link
Collaborator

Here is a fix for the driver: machine-drivers/machine@master...afbjorklund:virtualbox-ipaddr

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 14, 2021
@tstromberg tstromberg added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 18, 2021
@tstromberg
Copy link
Contributor

@afbjorklund - any luck with merging this?

@tstromberg tstromberg changed the title Minikube ip changes after laptop suspend/resume virtualbox: minikube ip changes after laptop suspend/resume Feb 18, 2021
@tstromberg
Copy link
Contributor

It's worth noting that this should not happen with the Docker driver any longer, but is likely to happen on other drivers.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 18, 2021
@sharifelgamal sharifelgamal removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2021
@sharifelgamal sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. area/networking networking issues labels Jun 23, 2021
@Pictor13
Copy link

Pictor13 commented Dec 8, 2021

Is there any update about this issue and merge of the fix?

Can't use minikube with virtualbox at all.
If the hyperkit driver is the only one really supported, then virtualbox driver should be clearly deprecated, so users don't struggle trying to make it work.

@sharifelgamal
Copy link
Collaborator

@afbjorklund Has your fix for libmachine been merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking networking issues co/virtualbox kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants