-
Notifications
You must be signed in to change notification settings - Fork 56
Clarify uses of kernel_start parameter #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
82fcaeb
to
c814ca0
Compare
src/loader/x86_64/elf/mod.rs
Outdated
// build by the value of CONFIG_PHYSICAL_START). Therefore, only attempt | ||
// to use PVH if an offset from the default load address has not been | ||
// requested using the kernel_offset parameter. | ||
if phdr.p_type == elf::PT_NOTE && kernel_offset.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling this parameter like so makes for implicit decisions. I find it better in code to let the user have explicit decisions.
With this approach it is easy to mis-use the function and not understand what you did wrong when it happens.
The implicit decision here is:
- if you pass kernel_offset than this must also mean that you don't want PVH even though p_type is PVH
An explicit decision is:
- p_type is PVH => you want to parse PVH
To keep the decision explicit what we can do instead is to ignore kernel_offset for PVH. I looks a bit like this is what was happening before. We can be explicit in the documentation as well and say that for PVH kernel_offset
is ignored.
What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point of letting the user make the decisions, but I am not really sure what is the best way to do it in this case. I'm on record saying that kernel_offset
does not have any real use cases for the Elf
implementation and it should be an unused parameter, but if we still want to keep it around then it means that PVH can not be returned as an option. The VMM "knows" it is not using PVH, since the return type from the load()
method has a pvh_entry_addr
field that is None
, but it does not know why it can't.
We can't exactly implement what you suggest above, so let me try to clarify how the PVH boot selection works, and we might be able to come up with a better idea.
The short explanation is that the VMM can not request when it calls loader::elf::Elf::load()
that PVH is used; it does not know yet if it is available. The availability of PVH is an extra choice, a "bonus feature", that is provided alongside the default boot protocol. But the VMM only knows if it is available after load()
returns. I didn't provide a way for VMMs to request exclusively that PVH is available or otherwise fail, since that decision can be made at the VMM level.
We cannot ignore the kernel_offset
request from the VMM and still provide the two options for the VMM to decide, since kernel_offset
affects how the kernel is loaded in memory. So I opted for honoring the request for kernel_offset
which means using Linux boot protocol. I will add a comment on the parameter description that mentions the effect it has on PVH (it disables it). I didn't want to declare the kernel_offset/kernel_start
parameter unused in the Elf case, since I can't prove 100% that is not needed.
Hopefully that made sense. Sorry for the long wall of text :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details on how the code works, mostly expanding on the previous comment. Feel free to ignore:
The VMM cannot directly request PVH boot, i.e. there is not "p_type is PVH" selection by the VMM. It simply requests to load a kernel that is in the ELF format by calling the loader::elf::Elf::load()
method.
The ELF header has a field (e_entry
) which holds the default entry point for the executable. i.e. this is the virtual address that is set in %rip
by the VMM configuration methods, and the guest will start executing code at this address when it is launched. Before PVH support, this value in e_entry
was the only entry point that could be used to boot a guest with an ELF (vmlinux) kernel. I'll call this "Linux boot", and this is the way it currently works in Firecracker:
https://github.com/firecracker-microvm/firecracker/blob/32c5bc3360cb3ee4d836faaa580e5fbfb314ef66/src/kernel/src/loader/mod.rs#L145
The PVH code changes that were merged earlier do not replace the mechanism above, they just add an alternative method, so the VMM now has two choices. Besides returning the default entry point contained in e_entry
, the load() method now also returns to the VMM an optional field called pvh_entry_addr
, which if present, tells the VMM that there is another address in the kernel where it can start executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is too pedantic, but I think we should actually write the pvh_entry_addr
field of the result if kernel_offset
is present, but its value is 0.
Another thought was to use an enum with three variants for pvh_entry_addr
(i.e. present
/none
/ignored
) instead of just an Option
, but I couldn't think of a scenario where we'd actively look for the distinction, so that probably doesn't add that much value (might be useful for debugging though). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is too pedantic, but I think we should actually write the
pvh_entry_addr
field of the result ifkernel_offset
is present, but its value is 0.
That is a good point that I didn't considered initially. But it seems to me that it contradicts the assertion that if kernel_offset
is requested, then we do not consider PVH at all. We would be adding a special case in the documented behavior with very little advantage.
That is my current take on it, but if you think that it has any use cases or makes things simpler from the VMM side let me know and I'll make the change.
Another thought was to use an enum with three variants for
pvh_entry_addr
(i.e.present
/none
/ignored
) instead of just anOption
, but I couldn't think of a scenario where we'd actively look for the distinction, so that probably doesn't add that much value (might be useful for debugging though). WDYT?
When adding PVH support I was trying to make minimal changes, and also didn't foresee that kernel_start
would be a problem, so an Option
was enough.
The downsides I see with using the enum are:
- It would be a breaking change for VMMs (e.g. changes are needed in Cloud Hypervisor and rust-hypervisor-firmware)
- The
ignored
variant would not be needed if there are future Linux kernel changes to allow PVH to work withkernel_offset
Again, I would have to defer to you and others on the discussion more experienced with the VMM implementations as to whether having an enum would make things better for the current main consumer (ICH) and future ones (Firecracker). I would opt to keep it simple and keep using the Option type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note on breaking changes: this crate is not even published yet, so I would like to make sure we have the appropriate interface before publishing it. I don't think we have any contract that we will not break the interface before (or even after) publishing. We are working with 0.x.x versions, so we aren't advertising it as stable.
@sameo @rbradford @sboeuf any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Regarding the different behavior for kernel_offset == 0
, I was thinking that maybe the value of the offset is specified programmatically in some cases, and when it's 0 the logic can detect that PVH boot is also available as an option and maybe use it. I guess it's just a synthetic thought experiment at this point, so the way things are right now in the PR is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point above, but I think the interface is simpler to document and understand for the users of the crate when:
kernel_offset.is_none()
==> PVH is not available
kernel_offset.is_some()
==> PVH is available
This is predictable and clearly defined. If we add a special case for kernel_offset == 0
, that blurs the boundaries between those cases. A user of the crate now needs implementation specific knowledge to understand why kernel_offset == 0
behaves differently.
If the user wants to look into the implementation details then it should be clear enough to understand that kernel_offset = Some(0)
is the same as kernel_offset = None
(when it comes to controlling where the kernel is loaded), but you can use PVH with the second variant.
So I'll keep the current approach, since you also mentioned above that you are fine with it. This is also a short change that could be added quickly if the need arises later.
src/loader/x86_64/elf/mod.rs
Outdated
// build by the value of CONFIG_PHYSICAL_START). Therefore, only attempt | ||
// to use PVH if an offset from the default load address has not been | ||
// requested using the kernel_offset parameter. | ||
if phdr.p_type == elf::PT_NOTE && kernel_offset.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is too pedantic, but I think we should actually write the pvh_entry_addr
field of the result if kernel_offset
is present, but its value is 0.
Another thought was to use an enum with three variants for pvh_entry_addr
(i.e. present
/none
/ignored
) instead of just an Option
, but I couldn't think of a scenario where we'd actively look for the distinction, so that probably doesn't add that much value (might be useful for debugging though). WDYT?
c814ca0
to
965b14a
Compare
2b586ed
to
6ce416e
Compare
Added changes implementing @alexandruag idea of using an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Nice work @aljimenezb
b1246be
to
95e4394
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you please add something like Related to https://github.com/rust-vmm/linux-loader/issues/12
to the message body of each commit? This is a guideline we're trying to implement right now to help with keeping track of things.
src/loader/x86_64/elf/mod.rs
Outdated
@@ -121,7 +154,9 @@ impl Elf { | |||
impl KernelLoader for Elf { | |||
/// Loads a kernel from a vmlinux elf image into guest memory. | |||
/// | |||
/// The kernel is loaded into guest memory at offset `phdr.p_paddr` specified by the elf image. | |||
/// By default, the kernel is loaded into guest memory at offset `phdr.p_paddr` specified by the elf image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style comment: can you please reformat these comment lines to keep them under 100 chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for the heads up. For some reason my editor (VSCode) does not warn me about those.
The load() method in the KernelLoader trait takes a kernel_start argument that has different behavior in the Elf, BzImage, and PE implementations. Elf: kernel_start provides an offset to be added to the default kernel load address encoded in the phdr.p_paddr field of the Elf header. A VMM using kernel_start requests that the kernel is loaded in guest memory at phdr.p_paddr + kernel_start. BzImage: kernel_start allows to specify an absolute address at which to load the kernel in guest memory, replacing the default address taken from the code32_start field of the bzImage header. PE: kernel_start specifies a 2MB-aligned base address in guest memory. The kernel must be loaded at kernel_start + text_offset, where text_offset value is read from the Image header. Change the name from kernel_start to kernel_offset, and change the argument description on each implementation to clarify its use. Related to rust-vmm#12 Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
According to https://www.kernel.org/doc/Documentation/arm64/booting.txt: "The Image must be placed text_offset bytes from a 2MB aligned base address anywhere in usable system RAM and called there." For the aarch64 case, the kernel_offset parameter allows a VMM to specify the desired base address at which to load the image in guest memory, while text_offset is contained in the Image header. Ensure that the base address requested by the VMM meets the 2 MB alignment requirement. Related to rust-vmm#12 Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Refactor the PVH code to return a field in KernelLoaderResult that specifies whether the PVH capability is present or not, or if it is being purposedly ignored. The only current reason for ignoring the PVH capability is when a kernel_offset is requested. Since PVH boot protocol expects the kernel to be loaded at the default load address, it cannot be used when a specific load offset has been requested by the VMM using the kernel_offset parameter. Related to rust-vmm#12 Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Done. Added it right before the Signed-off line. |
95e4394
to
69ab5d1
Compare
As part of the discussion in #12, we have noticed that the
kernel_start
parameter has different usages in the implementations forPE
,Elf
, andBzImage
. TheElf
implementation is specially confusing, sincekernel_start
is used to specify an offset to add to the default load address of the kernel. The discussion is ongoing to determine whether this has any real use cases or provides any security benefits.In the meantime, I propose to rename the parameter to
kernel_offset
, which might be more appropriate to describe its usage by the implementers ofKernelLoader
.Also, since PVH code expects the kernel to be loaded at its default load address, it cannot be used to boot guests if a non-default load address is requested by using the
kernel_offset
argument. Ifkernel_offset
is requested, we simply avoid looking for a PVH entry point.Resolves #12