Skip to content

Add support for userspace IOAPIC #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 10, 2025
Merged

Conversation

jakecorrenti
Copy link
Member

@jakecorrenti jakecorrenti commented Apr 6, 2025

Allow the user to specify if they want to split the IRQCHIP such that the LAPIC is still inside the guest, but the IOAPIC is in userspace and managed by the VMM.

Intel Trust Domain Extensions, for example, require this functionality to work.

The Intel specification for the device can be found here: https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf

@jakecorrenti jakecorrenti force-pushed the split-irq branch 4 times, most recently from 9eb8f7e to 2594125 Compare April 7, 2025 18:07
From https://www.github.com/cloud-hypervisor/cloud-hypervisor:

The kvm API has many structs that resemble the following `Foo`
structure:

```
struct Foo {
   some_data: u32
   entries: __IncompleteArrayField<__u32>,
}
```

In order to allocate such a structure, `size_of::<Foo>()` would be too
small because it would not include any space for `entries`. To make the
allocation large enough while still being aligned for `Foo`, a
`Vec<Foo>` is created. Only the first element of `Vec<Foo>` would
actually be used as a `Foo`. The remaining memory in the `Vec<Foo>` is
for `entries`, which must be contiguous with `Foo`.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
When enabling `KVM_CAP_SPLIT_IRQCHIP`, the LAPIC will be created by the
guest kernel. However, the VMM needs to create an IOAPIC in userspace to
redirect interrupts from the system bus to the respective LAPIC.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
When an interrupt is edge triggered, the Remote IRR bit needs to be set
to 0. `fix_edge_remote_irr` provides an easy way to set that bit.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
When triggering an interrupt in the kernel or commiting the IRQ routes,
the `IoApic` struct needs to access the VM's FD. Provide a function to
send a message to the IRQ worker thread with the specified message.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Each entry in the ioredtbl is a 64-bit bitfield. Make it easier to
access the information in the entry by parsing it into a struct and
determine the MSI information that can be gathered from the entry.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Add a function to update the MSI data for each IRQ route if it's not
masked.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Add function to prepare the entry to be serviced and send message to the
IRQ worker thread.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Implement the `BusDevice` trait for `IoApic` to handle reads and writes
from the system bus.

When reading the `IO_WIN` register, it's important to verify the data is
only 32-bits in size, as the register is 64-bits, but needs to do two
consecutive reads.

Like `read`, `write` needs to verify the data is 32-bits so the
register can be read twice and get the full 64-bits of data. In `write`,
the IO APIC can't write to the Version or Arbitration regiseters,
because those are Read-Only registers on the device.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Provide user API, `krun_split_irqhip`, to specify the VM should have
`KVM_CAP_SPLIT_IRQCHIP` enabled.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
If the user called `krun_split_irqchip` with `enabled` set to `true`,
start an IRQ worker thread to aid in servicing interrupts or commiting
IRQ routes.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
If the user calls `krun_split_irqchip` with `enabled` set to `true`,
then create an `IoApic` device in userspace rather than creating an
IOAPIC in the guest with KVM.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
@jakecorrenti jakecorrenti marked this pull request as ready for review April 7, 2025 19:49
@jakecorrenti
Copy link
Member Author

Are there any Copyright requirements if I used the QEMU C implementation for BusDevice::{read, write} but translated it to Rust?

@tylerfanelli
Copy link
Member

Are there any Copyright requirements if I used the QEMU C implementation for BusDevice::{read, write} but translated it to Rust?

I don't believe so. You could make a note such as "inspired by QEMU's bus device implementation" or something of the sort, if you'd like.

@@ -1521,6 +1529,54 @@ pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
.unwrap();
}

#[cfg(target_arch = "x86_64")]
if ctx_cfg.vmr.split_irqchip {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this and the GPU thread to some helper method in Vmm, but let's do that in a different PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was planning on doing it once this got merged, like you suggested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guest_memfd mappings should be moved to this thread as well.

Copy link
Collaborator

@slp slp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work here. +1

@jakecorrenti jakecorrenti mentioned this pull request Apr 10, 2025
@tylerfanelli tylerfanelli merged commit a014ac5 into containers:main Apr 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants