Skip to content

PVM guest kernel hang on AMD virtual machine #3

@zhuangel

Description

@zhuangel

Description

Boot demo VM on AMD Zen 2 virtual machine (which PCID is disabled) hangs.

Step to reproduce

  1. Build PVM host kernel and PVM guest kernel
    Following the guide pvm-get-started-with-kata.md, install PVM host kernel in AMD Zen 2 virtual machine.

  2. PVM VM resource from Guide
    cloud-hypervisor v37
    VM image from Guide

  3. Start PVM VM
    Start PVM VM on AMD Zen 2 virtual machine
    cloud-hypervisor.v37 `
    --api-socket ch.sock \
    --log-file vmm.log \
    --cpus boot=1 \
    --kernel vmlinux.virt-pvm-guest \
    --cmdline 'console=ttyS0 root=/dev/vda1 rw clocksource=kvm-clock pti=off' \
    --memory size=1G,hugepages=off,shared=false,prefault=off \
    --disk id=disk_0,path=ubuntu-22.04-pvm-kata.raw \
    -v --console off --serial tty

  4. The PVM VM hangs
    There is no output on serial, and VMM process CPU usage is very low(there is no dead loop), then I enable all kvm tracepoint, found there is msr_read emulate failed(index 0xc0011020, MSR_AMD64_LS_CFG), and PVM inject GP to PVM VM.
    Then PVM VM hangs in early_fixup_exception, because of the CS is __KERNEL32_CS.

vcpu0-3676813 [000] d..1. 275659.054772: kvm_exit: vcpu 0 reason GP excp rip 0xffffd97f81051232 info1 0x000000000000000d info2 0x0000000000000000 intr_info 0x0000000d error_code 0x00000000
vcpu0-3676813 [000] ..... 275659.054774: kvm_emulate_insn: 0:ffffd97f81051232:0f 32 (prot64)
vcpu0-3676813 [000] ..... 275659.054774: kvm_msr: msr_read c0011020 = 0x0 (#GP)
vcpu0-3676813 [000] ..... 275659.054775: kvm_inj_exception: #GP (0x0)

Workaround

I am not sure why __KERNEL32_CS in early_fixup_exception, maybe there should be some detection xen_pv_domain for PVM kernel, so that PVM kernel could finished fixup process.
And I could workaround the issue with following fix

--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -322,7 +322,7 @@ void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
* the 486 DX works this way.
* Xen pv domains are not using the default __KERNEL_CS.
*/
- if (!xen_pv_domain() && regs->cs != __KERNEL_CS)
+ if (!xen_pv_domain() && regs->cs != __KERNEL_CS && regs->cs != __KERNEL32_CS)
goto fail;

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions