Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU compatibility problem by set cpu_mode to host-model #82

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

albinsun
Copy link
Contributor

@albinsun albinsun commented May 8, 2024

Changes

  1. FIX Live migration fail caused by compatibility of emulated VM CPU
    • Set libvirt.cpu_mode to compatibility-oriented host-model (default value) instead of performance-oriented host-passthrough
    • See https://libvirt.org/formatdomain.html#cpu-model-and-topology

      ... However, for backward compatibility host-model may be implemented even for domains running on emulated CPUs in which case the best CPU the hypervisor is able to emulate may be used rather then trying to mimic the host CPU model.

    • image

Issue

Ref. [BUG] Live migration fail when upgrade v1.2.1 to v1.2.2-rc2 due to virError

Guest VM live migration fail due to Harvester's CPU doesn't match specification and missing feature flag waitpkg.

image

VirtualMachineInstance migration uid 5de2134c-25e2-404e-88b2-9307f54866c8 failed. reason:
Live migration failed error encountered during MigrateToURI3 libvirt api call: 
virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: waitpkg')

Cause

Ref. harvester/harvester#5755 (comment)

Some QEMU change between SLES SP4 and SP5. The issue happens when harvester nodes are in VMs and guests are in nested VMs. Here is the words from virtualization team:

but the bug is rather that you see the waitpkg flag in SP4, more than the fact that you don't see it in SP5

yes, SP5's QEMU behavior is correct, i.e., on your particular hardware, it's ok to not advertise that flag in a nested VM. It's actually SP4's QEMU that is at fault, i.e., it shouldn't advertise it in the first place, while instead it did. As I said, I can backport the fix to SP's QEMU, but this won't probably help you for that particular VM (or it would break it in even worse way, when/if the updated QEMU would reach SP4's KubeVirt)

@albinsun albinsun changed the title Change cpu_mode to default value (currently host-model) for reliability. Fix CPU compatibility problem by set cpu_mode to host-model May 8, 2024
@bk201
Copy link
Member

bk201 commented May 10, 2024

How about putting this to a setting and default it to host-passthrough?
So for machines with the issue, we can edit the setting.

@votdev
Copy link
Member

votdev commented May 13, 2024

How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.

In this case a FAQ is necessary to assist the user on identifying the problem and the root cause that leads to the problem. Otherwise, we have a settings option that nobody knows exactly what it is for and when to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants