Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested virt: kvm crash "kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed." #2968

Closed
shoffmeister opened this issue Jul 9, 2018 · 15 comments
Labels
cause/nested-vm-config When nested VM's appear to play a role co/kvm2-driver KVM2 driver related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@shoffmeister
Copy link

Environment:

Minikube version (use minikube version): 0.28.0

  • OS: Fedora 28 (Workstation)
  • VM Driver: kvm2 (out of 0.28.0 release)
  • ISO version:
    "Boot2DockerURL": "file:///home/stefan/.minikube/cache/iso/minikube-v0.28.0.iso",
    "ISO": "/home/stefan/.minikube/machines/minikube/boot2docker.iso",
  • Others:
    -- VMware Workstation 12 as type 2 hypervisor running on Windows 10 Home host; guest OS (see above) is intended to be the container of minikube.
    -- qemu-kvm -> QEMU emulator version 2.11.2(qemu-2.11.2-1.fc28)

What happened:
Failure to start minikube with error message "kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.", after

minikube delete
minikube start

What you expected to happen:
Successful launch of minikube, no error messages

How to reproduce it (as minimally and precisely as possible):

  • set up clean Fedora 28 Workstation inside VMware Workstation 12
    -- enable emulation of CPU virtualization instructions
    -- in the .vmx, add line apic.xapic.enabled = "FALSE"
  • sudo dnf install libvirt-daemon-kvm qemu-kvm
  • edit /etc/modprobe.d/kvm.conf to enable nested KVM: options kvm_intel nested=1 (I have an Intel CPU)
  • minikube config set vm-driver kvm2
  • minikube delete
  • minikube start

Anything else do we need to know:

This is a pristine setup where I am doing as little as possible to get kubernetes minikube to run inside KVM of Fedora 28 inside VMware Workstation. All configuration is intended to be as default and as controlled as humanly possible, so that I can maximize my chance of success for learning to use minikube.

minikube starts successfully with a changed KVM virtual machine CPU configuration, specifically

--- minikube-original.txt       2018-07-09 11:48:52.436387668 +0200
+++ hypervisor-default.txt      2018-07-09 11:49:20.776574169 +0200
@@ -15,7 +15,6 @@
     <apic/>
     <pae/>
   </features>
-  <cpu mode='host-passthrough' check='none'/>
   <clock offset='utc'/>
   <on_poweroff>destroy</on_poweroff>
   <on_reboot>restart</on_reboot>

i.e. switching from the default minikube value to "Hypervisor default" avoids the error message.

minikube KVM virtual machine configuration can be dumped using sudo virsh dumpxml minikube

Another working KVM virtual machine CPU configuration is

--- minikube-original.txt       2018-07-09 11:48:52.436387668 +0200
+++ copy-host-CPU-configuration.txt     2018-07-09 11:49:40.980707124 +0200
@@ -15,7 +15,9 @@
     <apic/>
     <pae/>
   </features>
-  <cpu mode='host-passthrough' check='none'/>
+  <cpu mode='host-model' check='partial'>
+    <model fallback='allow'/>
+  </cpu>
   <clock offset='utc'/>
   <on_poweroff>destroy</on_poweroff>
   <on_reboot>restart</on_reboot>

virt-manager (sudo dnf install virt-manager) offers a convenient means to inspect the virtual machine.

@shoffmeister
Copy link
Author

Refer to https://libvirt.org/formatdomain.html#elementsCPU for documentation on the XML

I am logging this defect against minikube, in part because I cannot find a means to control (via minikube configuration) the configuration of the VM that minikube creates on minikube start. Allowing configuring some CPU via minikube config set would work; on top, "hypervisor default" as the default CPU would appear to be the right thing (instead of, implicitly, hard-coding passthrough as is the case right now).

@shoffmeister
Copy link
Author

FWIW, I am not deeply familiar with kvm and qemu and native virtualization on Linux. Please provide pointers where to go and which incantations to use, to get the problem addressed at a more fundamental level.

@shoffmeister
Copy link
Author

https://bugs.launchpad.net/qemu/+bug/1661386 is an item tracked by qemu; qemu believe that this is fundamentally due to a VMware problem (https://communities.vmware.com/thread/592140)

Still, with minikube it would be interesting to configure the type of CPU for the virtualization - right now, minikube implicitly uses "host-passthrough" while hyper-visor default CPU or just host-model (see above) would work.

@tstromberg tstromberg changed the title kvm crash due to minikube VM configuration - "kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed." nested virt: kvm crash "kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed." Sep 19, 2018
@tstromberg tstromberg added kind/bug Categorizes issue or PR as related to a bug. cause/vm-networking Startup failures due to VM networking os/linux co/kvm2-driver KVM2 driver related issues labels Sep 19, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 17, 2019
@tstromberg tstromberg added cause/nested-vm-config When nested VM's appear to play a role priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed cause/vm-networking Startup failures due to VM networking lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. os/linux labels Jan 23, 2019
@luck02
Copy link

luck02 commented Feb 10, 2019

If you're on vmware, I found enabling cpu counters helped.

@mphz
Copy link

mphz commented Mar 13, 2019

If you're on vmware, I found enabling cpu counters helped.

what do you mean by enabling the cpu counters? I am using virtualbox and have 3 cpu for my centos 7

@mydockergit
Copy link

If you're on vmware, I found enabling cpu counters helped.

Thanks, I had similar issue when trying to start minishift and it solved it.
My issue.

@tstromberg
Copy link
Contributor

tstromberg commented May 22, 2019

minikube v1.1 will now recommend upgrading to QEMU 3.1 or higher if this error arises, which should address most cases.

Thanks for the tip on the VMware CPU counters, @luck02.

@michalgarcarz
Copy link

michalgarcarz commented Jun 22, 2019

Hello Team,
The issue still there with qemu 2.11.1, minikube 1.1.1.
I can not enable vmware cpu counters because using cluster with EVC on Vmware (counters with EVC not supported in Vcenter 6.5).
Also upgrade to qemu3.x is very (a way too) risky and painfull (and still i have doubts if that would solve this issue).

Mentioned workaround with cpu mode reconfiguration does not work, i have tried to modify /etc/libvirt/qemu/minikube.xml as presented (both options), then VM starts (virsh shows as running) but seems locked, i can not access console, minikube gives up after 120s with error:

Unable to start VM: start: Machine didn't return an IP after 120 seconds

@medyagh
Copy link
Member

medyagh commented Jul 23, 2020

I just had this issue myself on ubuntu 20.04
I had followed the docs on ubuntu site to install kvm and gave me this error


jenkins@mini-test-11-ubuntu:~$ minikube start --driver=kvm2
😄  minikube v1.12.1 on Ubuntu 20.04
✨  Using the kvm2 driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
🔥  Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🔥  Deleting "minikube" in kvm2 ...
🤦  StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: error creating VM: virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-07-23T00:16:37.741286Z qemu-system-x86_64: error: failed to set MSR 0x48b to 0x11582e00000000
qemu-system-x86_64: /build/qemu-74sXTC/qemu-4.2/target/i386/kvm.c:2680: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')
🔥  Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
😿  Failed to start kvm2 VM. "minikube start" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: error creating VM: virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-07-23T00:16:47.343106Z qemu-system-x86_64: error: failed to set MSR 0x48b to 0x11582e00000000
qemu-system-x86_64: /build/qemu-74sXTC/qemu-4.2/target/i386/kvm.c:2680: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')

❌  [KVM2_FAILED_MSR] error provisioning host Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: error creating VM: virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-07-23T00:16:47.343106Z qemu-system-x86_64: error: failed to set MSR 0x48b to 0x11582e00000000
qemu-system-x86_64: /build/qemu-74sXTC/qemu-4.2/target/i386/kvm.c:2680: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')
💡  Suggestion: Upgrade to QEMU v3.1.0+, run 'virt-host-validate', or ensure that you are not running in a nested VM environment.

by doing

sudo apt-get update 

and then this it worked:

$ sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils virt-manager

jenkins@mini-test-11-ubuntu:~$ minikube start --driver=kvm2
😄 minikube v1.12.1 on Ubuntu 20.04
✨ Using the kvm2 driver based on user configuration
👍 Starting control plane node minikube in cluster minikube
🔥 Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🐳 Preparing Kubernetes v1.18.3 on Docker 19.03.12 ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: default-storageclass, storage-provisioner
🏄 Done! kubectl is now configured to use "minikube"
💗 Kubectl not found in your path
👉 You can use kubectl inside minikube. For more information, visit https://minikube.sigs.k8s.io/docs/handbook/kubectl/
💡 For best results, install kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/

@medyagh medyagh reopened this Jul 23, 2020
@medyagh
Copy link
Member

medyagh commented Jul 23, 2020

we need to update our docs to stay install virt-manager

$ sudo apt-get update
$ sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils virt-manager

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 20, 2020
@priyawadhwa
Copy link

Our docs point to installation instructions for various Linux distros: https://minikube.sigs.k8s.io/docs/drivers/kvm2/

In the one for Ubuntu, it mentions installing virt-manager, so I'm going to go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cause/nested-vm-config When nested VM's appear to play a role co/kvm2-driver KVM2 driver related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

10 participants