Skip to content

underhill_core: serialize sidecar VP online #1443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 31, 2025

Conversation

jstarks
Copy link
Member

@jstarks jstarks commented May 30, 2025

The Linux kernel serializes CPU hotplug. If multiple sidecar VPs need to be onlined into OpenVMM simultaneously, they will all stop running the guest while associated Linux threads call into the Linux kernel to online the CPU (which will block on the CPU hotplug lock or whatever).

This means the average blackout time for a VP that's onlined early in boot is linear in the number of early-onlined VPs. And thanks to typical device configurations, this is usually linear in the total number of VPs. This is a performance problem.

To avoid this, explicitly serialize VP online before the target VP is stopped. This allows the VP to continue running the guest until it reaches the front of the online queue. This reduces the average blackout time to just the time to online one CPU, meaning this solution should scale to any number of VPs.

@jstarks jstarks requested a review from a team as a code owner May 30, 2025 03:55
Copy link

Copy link

@benhillis benhillis added release_2505 Targets the release/2505 branch. backport_2505 Change should be backported to the release/2505 branch and removed release_2505 Targets the release/2505 branch. labels May 30, 2025
jstarks added 7 commits May 30, 2025 23:12
The Linux kernel serializes CPU online. If multiple sidecar VPs need
to be onlined into OpenVMM simultaneously, they will all stop running
the guest OS and wait for their turn to be onlined into Linux.
This means the average blackout time for a VP that's onlined early in
boot is linear in the number of early-onlined VPs. And thanks to
typical device configurations, this is usually linear in the total number of VPs.

To avoid this, explicitly serialize VP online _before_ the target VP
is stopped. This allows the VP to continue running the guest until it
reaches the front of the queue. This reduces the average blackout time to
just the time to online one CPU, meaning this solution should scale
to any number of VPs.
@jstarks jstarks force-pushed the wait_to_avoid_waiting branch from 5754a31 to 3ffda59 Compare May 30, 2025 23:13
@jstarks jstarks merged commit 31ed3a7 into microsoft:main May 31, 2025
28 checks passed
@jstarks jstarks deleted the wait_to_avoid_waiting branch May 31, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport_2505 Change should be backported to the release/2505 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants