-
Notifications
You must be signed in to change notification settings - Fork 29
efi: support updating multiple EFIs in mirrored setups (RAID1) #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ddb26d3
to
d4bdbe1
Compare
src/efi.rs
Outdated
if let Some(esp_device) = self.get_esp_device() { | ||
esp_devices.push(esp_device.to_string_lossy().into_owned()); | ||
} else { | ||
esp_devices = blockdev::find_colocated_esps("/").expect("get esp devices"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered / tried the case with 2 disks / 2 separate OS ? (we do that in our lab, 2 or even 3 rpm-ostree based OS, each one on one disk, all installed via anaconda)
I'm asking because quickly looking at the code, I would use find_colocated_esps("/")
first and get_esp_device()
as a fallback.
The safest might even be:
- mounted ESP(s)
- colocated ESP(s)
- get_esp_device
If you have a bootc installed OS, and on a second disk an anaconda installed OS (not even ostree based), right now you are going to always pick the anaconda ESP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use find_colocated_esps("/") first and get_esp_device() as a fallback.
-
get_esp_device()
will find CoreOS ESP label/dev/disk/by-partlabel/EFI-SYSTEM
, or Anaconda ESP label/dev/disk/by-partlabel/EFI\\x20System\\x20Partition
-
find_colocated_esps("/")
will list ESP partitions on the devices with mountpoint/boot
, but it does not mean it is always mounted, for example, coreos does not mount esp after booted, see doc
Consider that on single disk, can easily get ESP label /dev/disk/by-partlabel/EFI-SYSTEM
via get_esp_device()
, but on multiple disks, it would be /dev/disk/by-label/esp-1
& /dev/disk/by-label/esp-2
, does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safest might even be:
mounted ESP(s)
colocated ESP(s)
get_esp_device
What I did: when running update, get_esp_devices()
would list the all esp devices, then in the loop device will check mounted ESP first, if not, will mount the esp then upgrade. I think you meant that need to check mounted firstly in the beginning, will try that, thanks!
Edit: For multiple disks, we still need the loop for each esp device. Will keep it as it is, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_esp_devices() would list the all esp devices
Maybe I don't understand rust code, but for me it first call get_esp_device()
which returns 1 device based on a weak heuristic, if that fails then it tries to find multiple devices
Here 3 test cases, second OS (sdb) booted, running get_esp_devices()
:
sda: normal anaconda install
sdb: ESP not mounted, not using anaconda/coreos label
-> returns sda ?
sda: normal anaconda install
sdb & sdc: raid 1 mirrored, ESP not mounted, not using anaconda/coreos label
-> returns sda ?
sda: normal coreos install
sdb: also a coreos install, ESP not mounted
-> not sure which PARTLABEL wins, likely random at each boot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have such env, but IMU, if second OS (sdb) booted, sda will not be mounted from the booted OS, in this case, we do not scan sda.
sda: normal anaconda install
sdb: ESP not mounted, not using anaconda/coreos label
skip sda, and get_esp_devices()
will call find_colocated_esps("/")
to find the mount point /boot device, then get ESP part on the same device
sda: normal anaconda install
sdb & sdc: raid 1 mirrored, ESP not mounted, not using anaconda/coreos label
skip sda, this is the case that we need to resolve actually, find_colocated_esps("/")
to find the mount point /boot
device which is /dev/md126
, then need to get the backing devices list sdb & sdc
and find ESP part on each device
sda: normal coreos install
sdb: also a coreos install, ESP not mounted
When I tried to boot VM with 2 coreos, failed with:
[ 9.373774] rdcore[957]: Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/vdb3", "/dev/vda3"]
[FAILED] Failed to start CoreOS Ignition Ensure Unique Boot Filesystem.
So we can skip this env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm with @champtar on this one. I do think the code in find_colocated_esps()
is a stronger heuristic since it doesn't rely on the labels.
For a36689b#r2000542257 - you are correct that you can't boot a system initially (i.e. Ignition boot) with multiple disks attached with the same boot/root filesystem labels, but there is nothing preventing you from first booting a system and then adding a disk later.
@HuijingHei you argue that sometimes /boot/
isn't mounted. Can we just say if /boot
isn't mounted we'll find the parent devices of /
(or target_root
) and use that instead?
Can we just get rid of get_esp_device()
if we move to find_colocated_esps()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this makes sense, sorry for the above misunderstanding.
Can I do some check that if there is only 1 disk, then would call get_esp_device()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I do some check that if there is only 1 disk, then would call get_esp_device() ?
You could, but what would be the benefit?
I could see it being beneficial to make that optimization (and thus keep two separate code paths) if the performance was much different between one versus the other and this was an operation that ran often. In this case, though, this isn't something that runs often and I think it would be much better to just have a single code path that we maintain if it can satisfy all of our requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, will update, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just say if /boot isn't mounted we'll find the parent devices of / (or target_root) and use that instead?
Check that findmnt -J -v --output=SOURCE,FSTYPE,OPTIONS,UUID /
will get "source": "composefs"
(which is not real device), would use /sysroot
instead and get "source": "/dev/vda4"
8e26fda
to
57d9de6
Compare
@travier could you help to review when you have time? Thanks! |
Dusty added some logic in openshift/os#1795 about making sure that the ESP that we are finding are part of the same device that the one the /boot partition is in and we should do the same for both the RAID and non RAID case. |
Agree, and what I did in this patch followed the same way. |
src/efi.rs
Outdated
if let Some(esp_device) = self.get_esp_device() { | ||
esp_devices.push(esp_device.to_string_lossy().into_owned()); | ||
} else { | ||
esp_devices = blockdev::find_colocated_esps("/").expect("get esp devices"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm with @champtar on this one. I do think the code in find_colocated_esps()
is a stronger heuristic since it doesn't rely on the labels.
For a36689b#r2000542257 - you are correct that you can't boot a system initially (i.e. Ignition boot) with multiple disks attached with the same boot/root filesystem labels, but there is nothing preventing you from first booting a system and then adding a disk later.
@HuijingHei you argue that sometimes /boot/
isn't mounted. Can we just say if /boot
isn't mounted we'll find the parent devices of /
(or target_root
) and use that instead?
Can we just get rid of get_esp_device()
if we move to find_colocated_esps()
?
src/efi.rs
Outdated
let sysroot = sysroot.recover_path()?; | ||
|
||
for esp_dev in esp_devices { | ||
let dest_path = if let Some(dest_path) = self.check_mounted_esp(&sysroot)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why we needed to create a new check_mounted_esp()
function (i.e. breaking up ensure_mounted_esp()
). If we do really need this maybe we can split it into a separate commit with rationale?
IMO now when we call ensure_mounted_esp(&sysroot, &esp_dev)
we should ensure that the esp_dev
we passed in is the one that is mounted on the mountpoint. i.e. if somehow a different ESP is mounted there then we shouldn't be happy about it.
src/efi.rs
Outdated
@@ -392,24 +434,37 @@ impl Component for Efi { | |||
} | |||
|
|||
fn validate(&self, current: &InstalledContent) -> Result<ValidationResult> { | |||
if !is_efi_booted()? && self.get_esp_device().is_none() { | |||
let esp_devices = self.get_esp_devices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are calling get_esp_devices()
multiple times. Perhaps we should run it once and store the data so that we aren't going through the same logic (and system calls) over and over again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used in different subcommands, but agree that need to make the changes to store the data about <parent devices>
, <bios_boot partitions>
, <esp partitions>
, a little tricky that they should be stored at runtime, any suggestions for this?
# mount -o remount,rw /boot && rm -f /boot/bootupd-state.json
# bootupctl update -vvvv
[TRACE bootupd] executing cli
Running as unit: bootupd.service
[TRACE bootupd] executing cli
[TRACE bootupd::bootupd] No saved state
[TRACE bootupd::bootupd] Remaining known components: 2
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::blockdev] Find bios_boot partitions: ["/dev/vda1"]
[TRACE bootupd::component] Adoptable: ContentMetadata { timestamp: 2025-04-22T07:11:03.919Z, version: "42.20250422.20.0" }
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::blockdev] Find esp partitions: ["/dev/vda2"]
[TRACE bootupd::efi] No efivars mount at "/sys/firmware/efi/efivars"
[TRACE bootupd::efi] No efivars mount at "/sys/firmware/efi/efivars"
[TRACE bootupd::component] Adoptable: ContentMetadata { timestamp: 2025-04-22T07:11:03.919Z, version: "42.20250422.20.0" }
[DEBUG bootupd::efi] Unmounting
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::blockdev] Find bios_boot partitions: ["/dev/vda1"]
[TRACE bootupd::component] Adoptable: ContentMetadata { timestamp: 2025-04-22T07:11:03.919Z, version: "42.20250422.20.0" }
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::bios] Install grub modules on /dev/vda
Adopted and updated: BIOS: grub2-tools-1:2.12-28.fc42.x86_64
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::blockdev] Find esp partitions: ["/dev/vda2"]
[TRACE bootupd::efi] No efivars mount at "/sys/firmware/efi/efivars"
[TRACE bootupd::efi] No efivars mount at "/sys/firmware/efi/efivars"
[TRACE bootupd::component] Adoptable: ContentMetadata { timestamp: 2025-04-22T07:11:03.919Z, version: "42.20250422.20.0" }
[DEBUG bootupd::blockdev] Find parent devices: ["/dev/vda"]
[DEBUG bootupd::blockdev] Find esp partitions: ["/dev/vda2"]
[DEBUG bootupd::efi] Mounted at "/boot/efi"
[TRACE bootupd::efi] applying adoption diff: additions: 0 removals: 0 changes: 0
[TRACE bootupd::efi] Unmounted
[DEBUG bootupd::efi] Unmounting
Adopted and updated: EFI: grub2-efi-x64-1:2.12-28.fc42.x86_64,shim-x64-15.8-3.x86_64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do experiments like change, initialize the parent devices at the beginning of update
or adopt_and_update
. Not sure if this is what we expect.
errs.push(format!("Removed: {}", f)); | ||
let esps = esp_devices.ok_or_else(|| anyhow::anyhow!("No esp device found!"))?; | ||
let dest_root = Path::new("/"); | ||
for esp_dev in esps.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now have multiple places where we basically need to iterate over esp_devices and mount them and then do_some_work()
I wish there was an easy way just to make that a funcion and then yield
like we can in python
so that we don't have to copy/paste this same code in many places.
Do you know of way to do that? I did some searching around rust book and the net and didn't find an easy answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I can not find either.
Skip BIOS adopt if booting with efi and none `bios_boot` part
Split `ensure_mounted_esp()` into two functions as we need to support multiple ESPs. - `get_mounted_esp()` to get the already mounted path - `ensure_mounted_esp()` to ensure mount the passed `esp_device`
For installation, only support single ESP now
is not mounted See pointer from coreos#855 (comment) If `/boot` isn't mounted we'll find the parent devices of `/sysroot` instead.
is not mounted See pointer from coreos#855 (comment) If `/boot` isn't mounted we'll find the parent devices of `/sysroot` instead.
Fixes error: `umount: /boot/efi: target is busy.`
The prep PR is #824
Fixes #132
Include 4 PRs:
#[allow(dead_code)]
for functionsis_efi_booted()
as it would fail onBIOS
only