-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework grub2 backend for new fedora grub model #717
Comments
See also #706 |
Not knowing the basics of the plan yet, I'll stick my neck out:
Now, GRUB + updated bls.mod + static grub.cfg + static bls snippets, can be discovered and dynamically generate the grub menu. And bootloader config changes can always be atomic. |
Currently ostree is designed around |
On journaled file systems, sync() only guarantees data and journal are committed to stable media, it does not guarantee the journal contents are flushed to fs metadata. This is relevant for bootloaders, none of which have the ability to read/replay the journal. So if the fs metadata is not yet updated, you can get boot failure in certain situations (write new, delete current, rename new to current). Long version (with links for long long long versions trying to figure all of this out) Right now ostree is not susceptible because it requires /boot on a separate fs volume. But if that layout were to change (supporting /boot as a directory or Btrfs subvolume) it'd be susceptible but chances are it would not be unbootable as in the bug cited in the issue above, but rather the menu would only show the current deployment rather than the new deployment. As soon as the kernel mounts the file system, it would replay the journal and update fs metadata, so the next reboot would show the newly added BLS entry. But to be certain of avoiding this problem: guarantee umount or remount ro (which may not be in ostree's domain), or use fifreeze. |
Yep, this is tracked in #876 |
See also http://marc.info/?l=linux-fsdevel&m=150179189820971&w=2 Basically I think the journaling problem may still require more sophistication. |
@cgwalters the BLS support on grub2 was improved for Fedora 28. I believe this better aligns to what's needed by ostree. Basically you can have a static This means that at least on non-EFI machines, grub2 should be able to use directly the BLS fragments managed by ostree. I've tried the following on a Fedora28 Atomic Workstation image (plus some fixes that are now in the grub2 Fedora package):
And on reboot, grub2 populates the boot options from the ostree BLS fragments in The My understanding is that this should be safe w.r.t the journal not flushed issue, since libostree calls to the Now, on the EFI case is a little more tricky. Since the BLS fragments are expected to be in the ESP, which is vfat so we can't do the But grub2 uses a
And have two symlinks in
And on deploy, the
So basically on EFI the |
FWIW my ideal on this is actually that |
I should have said first, thanks a ton for looking at this and testing it! I'll see if I can play with this some, if we're going to productize it it'll have to work through Anaconda too. Which...is probably going to require some install-time option? How about adding
But we still don't have atomic file replacement on VFAT right? That isn't a blocker per se, since it's the status quo today, but it kind of sucks to rework things and not solve this. Did you have any thoughts on my suggestion to have a grub ⇔ kernel-installer protocol that adds layers of redundancy and avoids relying fully on the filesystem to synchronize data? Maybe it's as simple as having grub read in order:
Then ostree does:
? |
On Wed, Apr 25, 2018 at 11:14 AM, Colin Walters ***@***.***> wrote:
FWIW my ideal on this is actually that grub2*.rpm were removed from the
host images entirely - instead, Anaconda would use its copy, write the data
into the /boot partition, and ostree would never touch it. There's some
nits on this like the fact that editions/spins can have their own themed
splash screens etc.; I dunno if I really care much though. If we did care
we could tweak Anaconda to look in the target root for such data.
My two cents, is the design should use one of three methods:
- systemd discoverable partitions (GPT partition type GUID instead of
/etc/fstab) to dynamically mount and unmount the ESP at /efi which keeps it
safer
or
- make the ESP completely static, never persistently mount it, use a
forwarding config on the ESP to point to the real (and changing) bootloader
config in a fixed location no matter what the device firmware is.
or
- both of the above; where the second instance is the usual case, and the
first instance only comes up when the bootloader binary itself needs
updating.
…--
Chris Murphy
|
I'm not seeing why not, at least if the file is 512B and the file size isn't changing. |
On Wed, Apr 25, 2018 at 8:40 AM, Javier Martinez Canillas < ***@***.***> wrote:
My understanding is that this should be safe w.r.t the journal not flushed
issue, since libostree calls to the FIFREEZE and FITHAW ioctls on deploy.
For what it's worth, sync() is still needed on file systems that do not
support FIFREEZE/FITHAW.
So far in testing sync() on FAT, ext4, and Btrfs the changes are fully
committed and appear to the bootloader. Whereas sync() isn't sufficient for
XFS, changes are only guaranteed to be in the log which the bootloader
can't read, hence FIFREEZE/FITHAW. I can understand the XFS devel position
on this as it relates to fsync() but I'm finding it increasingly specious
to apply it to sync().
Having straced both grubby and grub-mkconfig, there is no fsync() or sync()
at all, seemingly the design from the outset was assuming the volume would
be successfully unmounted to fully commit the changes.
…--
Chris Murphy
|
@cgwalters that would certainly be possible with BLS + a static grub2 config file, since there won't be need for any grub2 utility, script or configuration (besides grub.cfg and grubenv). But do you mean not even having the grub2-efi-x64 package? I now noticed that ostree doesn't update grub2 like we do on the non-atomic Fedora version:
|
Yes, the plan is to do something like that. About the grubenv change not being safe on vfat, @vathpela already commented on this. Please let us if you think that's sensible what I mentioned in #717 (comment), so I could give it a try and cook a patch for ostree. |
Exactly; it'd be in Anaconda, not in the target tree/image.
Yep, because I don't know how to make it transactional. See: coreos/rpm-ostree#969 There's also mailing list discussion about this, but the TL;DR is we today copy e.g. In the model I'm proposing, |
I am not a lowlevel storage expert; I'd certainly believe that a small write is atomic today with linux+vfat. On this subject I found the sqlite docs useful. My worry (perhaps completely unfounded) is that some level in the stack would defeat this. Anyways, I'm vacillating a bit here myself between whether to try to change+fix everything at once or polish what's landed so far in f28 grub. |
With #1503 we could move libostree to only touch
Hi Chris - this has been rather extensively covered elsewhere, not sure we need to rehash it here. |
Right, that's what I understood but just wanted to be sure.
I see, then everything that's installed by packages in You mentioned in one of the threads that for the server case, frequent re-provisioning is quite common which makes this less of an issue. But not everyone will do that and it's definitely an issue for the Workstation case (I for example only do a fresh install every few years when I change my laptop). At least for EFI (which I guess is the most fragile case due the ESP using vfat), we could make the upgrade more robust. For example by installing the new shim as And maybe shim could also be changed to check a digest for a Probably @vathpela has better ideas, but what I'm trying to say is that I think we could find a way to make this work reliable. |
Right, I think these are orthogonal issues that could be tackled separately. If anything, the BLS support should improve the current situation since it will avoid the grub2-mkconfig -o grub.cfg.new && cp grub.cfg grub.cfg.old && copy grub.cfg.new grub.cfg that happens on the ESP and would just write to a file with fixed size. |
That'd be awesome. I know it adds a lot of complexity. In the big picture, none of this really matters for "classic" systems as nothing about it is transactional, so there isn't really pressure to fix the bootloader part. But since ostree makes
Yep.
Yeah. I definitely have a worry that we may run into an issue where we start installing new grub config files that old grub doesn't know how to parse. The thing I wanted to avoid as much as possible is a UI screen that effectively says "Don't turn off the power to your computer now, or it may be toast". On a lot of classic package managed (apt/yum/etc) systems, they simply don't even say that, even though it's the case. If we don't make the bootloader installation transactional, then we need to have such as screen and think about when it's run. |
Agreed, although I was referring more to having a first and second stage boot-loader that are many years old.
Yes, I understand the concern. As mentioned, I think it would be possible by using EFI variables and making shim smarter. And that's something that will be an improvement also for package managed distros since as you said shim or grub could contain a regression that leads to an unbootable system (although I don't remember having that issue and have been using Fedora for a long time). But as said, I think that BLS and how to make upgrading the components in the boot chain more reliable are separate issues, and we shouldn't attempt to solve all the problems at once. |
I've filled #1649 to discuss the issue of the ESP never being updated by ostree. Let's keep this only to discuss the grub2's BLS support and how it can be used from ostree. |
From recent discussion in fedora-devel mailing list, the BLS snippets are going to be in The only remaining issue is that we want grub2 to sort the boot menu entries using the BLS snippets filenames, but ostree uses |
I'm OK changing this, but discussion also made it sound like we'd sort by |
@cgwalters currently yes, but that's only true for grub2. Petitboot for example only uses the BLS filename to sort the entries, so it won't work on ppc64le. So the discussion is if we want to do the same for grub2 to be consistent with what the other bootloaders do. I see that in most places So the actual change should just be the following (modulo fixing the tests to take this into account):
Now, this will only work if there are deployments for a single OS, but if you do:
Then you will a |
BTW, when looking at this I found a bug in
So instead it should use the -v option to sort by version:
I'll propose a PR to fix that. |
That makes sense to me! |
Great, I'll propose that change then. Thanks a lot for your feedback! |
@cgwalters I've created PR #1654 that changes the BLS filenames as discussed. |
does this issue relevant now after merging BLS support to ostree? (as i'm read before it already present in ostree now) |
Closing this in favor of #1951 - most of the other things here ended up being implemented for Fedora CoreOS. |
There's some plans to change how grub2 works in Fedora to better align with BLS among other things.
Among other things this should allow us to finally fix the grub2 config updates aren't atomic on EFI problem.
TODO: link to plans
@vathpela also points out that we could try testing for
/sys/firmware/efi
instead of the EFI config path potentiallyThe text was updated successfully, but these errors were encountered: