Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qemu fails to start when more than 8 CPUs are set #2190

Open
johnklehm opened this issue May 7, 2022 · 6 comments
Open

qemu fails to start when more than 8 CPUs are set #2190

johnklehm opened this issue May 7, 2022 · 6 comments

Comments

@johnklehm
Copy link

johnklehm commented May 7, 2022

Actual Behavior

If I request 9 cores on the Kubernetes settings menu Rancher Desktop fails to start the cluster.
The culprit seems to be found when looking at ha.stderr.log:

{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}

Hoping we can get the cpu slider dealio to restrict the number of cpus to be the lesser of either the available core count or whatever the max supported by qemu is.

Steps to Reproduce

  1. Install rancher desktop 1.3.0 on an M1 Pro
  2. Select 9 CPUs in the Kubernetes Settings Tab of the Rancher Desktop Preferences Menu
  3. When rancher restarts k8s to use the new settings you'll see the failure messages I posted below.

Result

Kubernetes Error
Rancher Desktop 1.3.0 - darwin (x64)
Error Starting Kubernetes
Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
Last command run:
/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl start --tty=false 0

Context:
Starting virtual machine

Some recent logfile lines:
time="2022-05-06T21:27:50-05:00" level=info msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
time="2022-05-06T21:27:50-05:00" level=info msg="[hostagent] QEMU has exited"
time="2022-05-06T21:27:50-05:00" level=fatal msg="exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see \"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ha.stderr.log\")"
2022-05-07T02:27:50.996Z: + limactl start --tty=false 0
2022-05-07T02:27:50.997Z: Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
2022-05-07T02:27:51.001Z: Error starting lima: Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
    at ChildProcess.<anonymous> (/Applications/Rancher Desktop.app/Contents/Resources/app.asar/dist/app/background.js:17:141690)
    at ChildProcess.emit (node:events:390:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

The culprit seems to be found when looking at ha.stderr.log:

{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}

Full ha.stderr.log:

➜  ~ less "/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ha.stderr.log"

{"level":"warning","msg":"This version of QEMU might not be able to boot recent Linux guests on M1 macOS hosts.Reinstall QEMU with the following commits (included in QEMU 7.0.0):\n- https://github.com/qemu/qemu/commit/ad99f64f \"hvf: arm: Use macros for sysreg shift/masking\"\n- https://github.com/qemu/qemu/commit/7f6c295c \"hvf: arm: Handle unknown ID registers as RES0\"\nSee https://github.com/Homebrew/homebrew-core/pull/96743 for the further information.","time":"2022-05-06T21:09:57-05:00"}
{"level":"warning","msg":"field `firmware.legacyBIOS` is not supported for architecture \"aarch64\", ignoring","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"firmware candidates = [/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/share/qemu/edk2-aarch64-code.fd /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/qemu-efi-aarch64/QEMU_EFI.fd]","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"OpenSSH version 8.6.1 detected","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"AES accelerator seems available, prioritizing aes128-gcm@openssh.com and aes256-gcm@openssh.com","time":"2022-05-06T21:09:57-05:00"}
{"level":"info","msg":"Starting QEMU (hint: to watch the boot progress, see \"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.log\")","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"qCmd.Args: [/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/qemu-system-aarch64 -m 10240 -cpu host -machine virt,accel=hvf,highmem=off -smp 9,sockets=1,cores=9,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/share/qemu/edk2-aarch64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/basedisk,media=cdrom,readonly=on -drive file=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/diffdisk,if=virtio -cdrom /Users/jklehm/Library/Application Support/rancher-desktop/lima/0/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:56224-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:90:40:85 -netdev vde,id=net1,sock=/private/var/run/rancher-desktop-shared.ctl -device virtio-net-pci,netdev=net1,mac=52:55:55:5d:29:f1 -netdev vde,id=net2,sock=/private/var/run/rancher-desktop-bridged_en0.ctl -device virtio-net-pci,netdev=net2,mac=52:55:55:e7:ac:38 -device virtio-rng-pci -display none -vga none -device ramfb -device qemu-xhci,id=usb-bus -device usb-kbd,bus=usb-bus.0 -device usb-mouse,bus=usb-bus.0 -parallel none -chardev socket,id=char-serial,path=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.sock,server=on,wait=off,logfile=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-0 -pidfile /Users/jklehm/Library/Application Support/rancher-desktop/lima/0/qemu.pid]","time":"2022-05-06T21:09:57-05:00"}
{"level":"info","msg":"Waiting for the essential requirement 1 of 5: \"ssh\"","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"executing script \"ssh\"","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/Users/jklehm/Library/Application Support/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=jklehm -o ControlMaster=auto -o ControlPath=\"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -p 56224 127.0.0.1 -- /bin/bash]","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 56224: Connection refused\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 56224: Connection refused\\r\\n\": exit status 255","time":"2022-05-06T21:09:57-05:00"}

Hoping we can get the cpu slider dealio to restrict the number of cpus to be the lesser of either the available core count or whatever the max supported by qemu is.

Expected Behavior

For the cluster to restart with the settings I specified.
For the GUI to only allow me to set a valid configuration.

Additional Information

M1 Pro (10 cores)

Rancher Desktop Version

1.3.0

Rancher Desktop K8s Version

1.23.6

Which container runtime are you using?

containerd (nerdctl)

What operating system are you using?

macOS

Operating System / Build Version

ProductName: macOS ProductVersion: 12.3.1 BuildVersion: 21E258

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

@johnklehm johnklehm added the kind/bug Something isn't working label May 7, 2022
@jandubois
Copy link
Member

It looks like this is a limitation of qemu when using the hvf (Apple's Hypervisor Framework) accelerator to run at native speed.

There is a lot of background information at utmapp/UTM#3180

TL;DR:

  • Limit depends on the version of the qemu "generic interrupt controller"; max limit for v2 is 8 CPUs
  • qemu with GICv3 and accel=hvf is not implemented (patch not accepted upstream)
  • Running with more cores, but using CPU emulation defeats the purpose of using more cores

Tasks:

  • For the time being we should limit the number of cores on M1 machines to 8, to avoid this failure.
  • Maybe: Investigate asking for performance cores, to make sure qemu isn't running on the power-saving efficiency cores. This will probably require support from Lima.

@mayrbenjamin92
Copy link

Good to know what the root cause is - although it actually means that running on e.g. a Mac Studio with 20 cores and e.g. assigning 14 Cores for containerized workloads does not work. Is there any other solution for this?

@jandubois
Copy link
Member

Is there any other solution for this?

Unfortunately I don't see one right now. I'm not following the qemu mailing list, but it looks like the discussions about these things are somewhat contentious. 😞

I hope that one day we can take a look at using the Apple virtual machine framework as a configurable alternative to qemu, but I have no idea how much work this will be.

@mayrbenjamin92
Copy link

I just started to download UTM and spawn an x86_64 emulated VM on Linux basis with 32 GB memory and 12 Cores - so far so good

@gaktive
Copy link
Contributor

gaktive commented Sep 13, 2022

@rak-phillip we should have a separate ticket to have a UI hard limit with a tool tip for this. "We know you have 24 CPUs but based on the VM limitations, it'll be set to 8." or something.

@gaktive gaktive modified the milestones: Next, Later Sep 13, 2022
@agraf
Copy link

agraf commented Dec 27, 2022

The QEMU issue to track GICv3 support which would enable -smp > 8 is this: https://gitlab.com/qemu-project/qemu/-/issues/743. I would appreciate Tested-by / Reviewed-by tags on the mailing list to push it forward :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants