Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement measurement and policy generation/luks unlock of sysextensions for UKI #2608

Closed
Tracked by #1792
Itxaka opened this issue May 31, 2024 · 18 comments · Fixed by kairos-io/kairos-agent#372, kairos-io/immucore#330, #2614, kairos-io/packages#890 or kairos-io/packages#912
Assignees
Labels
enhancement New feature or request triage Add this label to issues that should be triaged and prioretized in the next planning call

Comments

@Itxaka
Copy link
Member

Itxaka commented May 31, 2024

Follow up on #2117

This is the implementation part as the spike revealed that there is nothing to do this currently.

The objective would be to treat sysextensions like an UKI and be able to generate a policy and attach it to them than we can use to unlock the system. Policy/signatures should be able to be done offline as to generate them outside of the node.

This way we would be able to extend a UKI system via sysext without opening a hole in the security.

Like k3s deployed via sysextension or stylus.

The reason for the policy would be to be able to upgrade those sysextensions while not breaking the measurements.

The reson for needing sysextensions would be to keep the size of the system to a minimum as EFI firmware usually cant deal with big EFI files and bundling everything in the EFI file blows the size.

Refer to original spike for more info on whats currently on the market for this.

@Itxaka Itxaka added enhancement New feature or request triage Add this label to issues that should be triaged and prioretized in the next planning call labels May 31, 2024
@ci-robbot
Copy link
Collaborator

I'm sorry, but your issue does not have enough information to be triaged. Could you please provide steps to reproduce, relevant versions of artifacts and a clear description of the issue? I will label your issue with 'add_label_to_github_issue' and 'question' until I have more information to triage it. I am a bot, an experiment of @mudler and @jimmykarily.

@Itxaka Itxaka self-assigned this Jun 6, 2024
@Itxaka
Copy link
Member Author

Itxaka commented Jun 6, 2024

Useful links:

https://www.freedesktop.org/software/systemd/man/latest/systemd-repart.html#
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg49152.html
systemd/mkosi@c42d816#diff-9c45af3bc838475da5184dc4aa03361d9055036071fe2831915cdb4379fbd786
https://github.com/flatcar/sysext-bakery
https://www.freedesktop.org/software/systemd/man/latest/systemd.image-policy.html#
https://www.freedesktop.org/software/systemd/man/latest/systemd-sysext.html
https://0pointer.net/blog/fitting-everything-together.html

Status

Support is there and working out of the box mainly.

Create a verity+signed sysext image:

sudo systemd-repart -S -s test/ k3sv1.30.0+k3s1.sysext.raw --private-key=db.key --certificate=db.pem

Copy it into /EFI/NAME/EFINAME.efi.extra.d/EXTENSION.raw
In our case, we need to copy it several times unfortunately as there is no generic one like addons have in which putting them in laoder/addons just copies it for all entries

So we would need to copy it to /EFI/kairos/active.efi.extra.d/EXTENSION.raw and /EFI/kairos/passive.efi.extra.d/EXTENSION.raw

Then reboot, systemd-stub will read those raw extensions and pass it to the initramfs inside the /.extra/sysext dir.

That dir is only read during initramfs, which is set via a file in /etc/initrd-release or via env var SYSTEMD_IN_INITRD. We can use this to make it read the .extra dir automatically AND make it default to a more secure policy.

Call sysext with a strict policy systemd-sysext refresh --image-policy="root=verity+signed:usr=verity+signed"

Things magically appear!

root@localhost:~# systemd-sysext refresh --image-policy="root=verity+signed:usr=verity+signed"
[75963.662137] device-mapper: table: 252:2: verity: Root hash verification failed (-ENOKEY)
[75963.662706] device-mapper: ioctl: error adding target to table
device-mapper: reload ioctl on 65092112751c46ee4dbc6039ba2920e0cb09035d5d158c7a72454a15fa931afa-verity (252:2) failed: Required key not available
Using extensions 'k3sv1.30.0+k3s1'.
Merged extensions into '/usr'.
root@localhost:~# k3s -v
k3s version v1.30.0+k3s1 (14549535)
go version go1.22.2

Things that we need to do to support this out of the box

Decide if we are gonna support initrd or early sysext AND persistent or late sysext OR both.

initrd/early means that we can enable the sysext really early. Like just after mounting the main system mounts. This is a bit dangerous as we still havent mounted anything else from persistent so there is an overlap issue here. i.e. a sysext mounts something on /usr/local/X and then the persistent mounts over it as well. It also means that we have to force the loading as sysext are not supported during initrd, only confext, and forcing them could work but we are skipping a few sanity checks. So this becomes prone to failure

IMO, we should wait until persistent is loaded, then immediately afterwards run sysext so we load things over the persistent stuff. That makes more sense to me due to the posibility of upgrading existing things on persistent, like k3s via sysext and less moving parts.

As we cannot use the initrd env var without breaking a lot of stuff, the idea would be to copy the /.extra/sysext/* files into /run/extensions so it picks them up and they are cleared on reboot. It would also pick anything under /var/lib/extensions which is on persistent, so bundled images or manually copied files in there would also work. This is nice so it gets clear on each reboot and we dont have to manage copying and removing them from persistent.

In fact I would recommend using the EFI sysext method only for generic things (k3s for example) as for troubleshooting its easier and breaking it and cleaning its easier via recovery and dont need to unlock the persistent partition.

sysext on /var/lib/extensions should be done for extensions that contains private code and need to be locked out of easy access (contains private code, credentials and such) as they are more difficult to access from the outside (booting livecd, recovery)


Decide if we are gonna to make the strict policy the default via overriding the service file

Currently the image policy (verity+signed or absent or more, see Disk Image Dissection Policy seems to be a bad default as it basically accepts everything.
Triggering the initrd env var makes the policy stricter ("root=signed+absent:usr=signed+absent") which is good BUT on initrd we are not supposed to use sysext in initrd, only confexts. So we cant really use that fake initrd var to default to a stricter policy.

IMO, we should override the default sysext service to add the image-policy flag with the strict policy by default and run the service.

Notice that is still possible to manually run sysext as root and skip the policy or set a more relaxed one, but at that point, with root you can basically do anything (download a binary into persistent for example?) so this is expected. We just want to be able to load extensions during boot via immucore and fail if they dont conform to our strcit policy before giving a shell.


Decide what are we doing when finding unsigned sysext

We have 2 possibilities. Either warn and continue or directly fail booting. As the sysext are PER efi file, even if we add them to active+passive we still have recovery NOT loading them, so if something goes wrong, a user can still boot in recovery and manually delete the sysextensions from the EFI partition directly, which should make it work again.

IMO, we should default to strict policy, allow for a permissive one instead. Configuration in the /oem/*.yaml should be possible if we are loading them after persistent


Decide how are we gonna deal with 1 broken/unsigned sysext

Currently there is an issue that if you got 10 sysext and just one doesnt fill the policy, it stops loading them all.

Im not sure this covers our use case as it seems to only check for invalid images (do not match the system ID and such) but its a real pain in the ass.

Ideally, as we are in control of the process and we copy them into place, we should be able to verify them before copying them but its a laborious process.
We would need to mount the image manually on loop devices (3 partitions), extract the verity signature and the key, and check them against the keys in the system. Sounds easier but I dont have the full process identified so it could be very complex.

Another way would be much more simpler but slower. We copy 1 extension, call sysext refresh and if the return code is not zero, we remove that extension and keep going.
This solution is the most straightforward currently as any fixes for this will come in systemd 257 and manually recreating the verity process may be time consuming, but it also brings slowness to the whole booting thing depending on how many extensions there are (consider doing bisect for a big number of extensions instead of 1 by one)


Decide how to sign the extensions for documentation purposes

For signing the extensions we need a key and cert. I been using the secureboot ones to do it and they work as expected (either PK, KEK or DB work) BUT due to a configuration not enabled in the kernel by default the MOK keys are not available in the kernel keyring, which means if we are using those keys for the sysext signing we need to manually extract them and put them under /run/verity.d which is trivial.

More problematic is if the user decides to sign them with a different key+cert, then we need a way of producing the public cert and storing it somewhere. Probably via config file as usual?

sysext_cert: |
  -----BEGIN CERTIFICATE-----
.....

This is trivial IMO, we should recommend signing this stuff with the DB key in the same process that sings the EFI files, but Im sure somone is gonna ask for this, so we better have an alternative ready. Kind of makes sense to have a different key for this as you can keep your DB key secure, but may need to sign sysexts more frequently (k3s patch versions for example)


Final notes

This is pretty much ready for consumption and only minor things around. We should be ok to have it ready for 3.1.x maybe on beta status as we polish it around.

@mudler
Copy link
Member

mudler commented Jun 6, 2024

Useful links:

https://www.freedesktop.org/software/systemd/man/latest/systemd-repart.html# https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg49152.html systemd/mkosi@c42d816#diff-9c45af3bc838475da5184dc4aa03361d9055036071fe2831915cdb4379fbd786 https://github.com/flatcar/sysext-bakery https://www.freedesktop.org/software/systemd/man/latest/systemd.image-policy.html# https://www.freedesktop.org/software/systemd/man/latest/systemd-sysext.html https://0pointer.net/blog/fitting-everything-together.html

Status

Support is there and working out of the box mainly.

That's great! awesome finds @Itxaka !

Create a verity+signed sysext image:

sudo systemd-repart -S -s test/ k3sv1.30.0+k3s1.sysext.raw --private-key=db.key --certificate=db.pem

Copy it into /EFI/NAME/EFINAME.efi.extra.d/EXTENSION.raw In our case, we need to copy it several times unfortunately as there is no generic one like addons have in which putting them in laoder/addons just copies it for all entries

So we would need to copy it to /EFI/kairos/active.efi.extra.d/EXTENSION.raw and /EFI/kairos/passive.efi.extra.d/EXTENSION.raw

Then reboot, systemd-stub will read those raw extensions and pass it to the initramfs inside the /.extra/sysext dir.

That dir is only read during initramfs, which is set via a file in /etc/initrd-release or via env var SYSTEMD_IN_INITRD. We can use this to make it read the .extra dir automatically AND make it default to a more secure policy.

Call sysext with a strict policy systemd-sysext refresh --image-policy="root=verity+signed:usr=verity+signed"

Things magically appear!

root@localhost:~# systemd-sysext refresh --image-policy="root=verity+signed:usr=verity+signed"
[75963.662137] device-mapper: table: 252:2: verity: Root hash verification failed (-ENOKEY)
[75963.662706] device-mapper: ioctl: error adding target to table
device-mapper: reload ioctl on 65092112751c46ee4dbc6039ba2920e0cb09035d5d158c7a72454a15fa931afa-verity (252:2) failed: Required key not available
Using extensions 'k3sv1.30.0+k3s1'.
Merged extensions into '/usr'.
root@localhost:~# k3s -v
k3s version v1.30.0+k3s1 (14549535)
go version go1.22.2

Things that we need to do to support this out of the box

Decide if we are gonna support initrd or early sysext AND persistent or late sysext OR both.

initrd/early means that we can enable the sysext really early. Like just after mounting the main system mounts. This is a bit dangerous as we still havent mounted anything else from persistent so there is an overlap issue here. i.e. a sysext mounts something on /usr/local/X and then the persistent mounts over it as well. It also means that we have to force the loading as sysext are not supported during initrd, only confext, and forcing them could work but we are skipping a few sanity checks. So this becomes prone to failure

this might have sense if we let sysext to overlay directories also outside of /usr

IMO, we should wait until persistent is loaded, then immediately afterwards run sysext so we load things over the persistent stuff. That makes more sense to me due to the posibility of upgrading existing things on persistent, like k3s via sysext and less moving parts.

That makes most of sense indeed

As we cannot use the initrd env var without breaking a lot of stuff, the idea would be to copy the /.extra/sysext/* files into /run/extensions so it picks them up and they are cleared on reboot. It would also pick anything under /var/lib/extensions which is on persistent, so bundled images or manually copied files in there would also work. This is nice so it gets clear on each reboot and we dont have to manage copying and removing them from persistent.

In fact I would recommend using the EFI sysext method only for generic things (k3s for example) as for troubleshooting its easier and breaking it and cleaning its easier via recovery and dont need to unlock the persistent partition.

sysext on /var/lib/extensions should be done for extensions that contains private code and need to be locked out of easy access (contains private code, credentials and such) as they are more difficult to access from the outside (booting livecd, recovery)

that's going to be tricky, would be nice to expose for the user that down the line (for instance, picking up extensions from the LiveCD and installing them in the persistent portion which is mapped to /var/lib/extensions

Decide if we are gonna to make the strict policy the default via overriding the service file

Currently the image policy (verity+signed or absent or more, see Disk Image Dissection Policy seems to be a bad default as it basically accepts everything. Triggering the initrd env var makes the policy stricter ("root=signed+absent:usr=signed+absent") which is good BUT on initrd we are not supposed to use sysext in initrd, only confexts. So we cant really use that fake initrd var to default to a stricter policy.

IMO, we should override the default sysext service to add the image-policy flag with the strict policy by default and run the service.

definetly, otherwise an open policy would let anyone to drop a file in there and get loaded and overlaying part of the system (? to be tested ?) - better safe then sorry.

Decide what are we doing when finding unsigned sysext

We have 2 possibilities. Either warn and continue or directly fail booting. As the sysext are PER efi file, even if we add them to active+passive we still have recovery NOT loading them, so if something goes wrong, a user can still boot in recovery and manually delete the sysextensions from the EFI partition directly, which should make it work again.

One of the major problem about this approach is that there is no manual intervention in most cases - so it would be better to be conservative here and do not default to a strict policy. This allows a downstream component (e.g. a provider or a system agent) to take over and decide to recover in other ways automatically without breaking the box. however, this should be configurable during installation via cloud config.

Decide how are we gonna deal with 1 broken/unsigned sysext

Currently there is an issue that if you got 10 sysext and just one doesnt fill the policy, it stops loading them all.

* issue: [systemd-sysext refuses to mount any extension if one is wrongly signed, even if its extension-release does not match os-release systemd/systemd#32762](https://github.com/systemd/systemd/issues/32762)

* pr: [sysext: ignore invalid image files systemd/systemd#32967](https://github.com/systemd/systemd/pull/32967)

Im not sure this covers our use case as it seems to only check for invalid images (do not match the system ID and such) but its a real pain in the ass.

Ideally, as we are in control of the process and we copy them into place, we should be able to verify them before copying them but its a laborious process. We would need to mount the image manually on loop devices (3 partitions), extract the verity signature and the key, and check them against the keys in the system. Sounds easier but I dont have the full process identified so it could be very complex.

Another way would be much more simpler but slower. We copy 1 extension, call sysext refresh and if the return code is not zero, we remove that extension and keep going. This solution is the most straightforward currently as any fixes for this will come in systemd 257 and manually recreating the verity process may be time consuming, but it also brings slowness to the whole booting thing depending on how many extensions there are (consider doing bisect for a big number of extensions instead of 1 by one)

Let's stick to the simple case here - default is good. If even one of the sysext is broken, we should ignore all of them or fatal ( as in the question earlier above) - it means that either the machine is being compromised, or there is a bug that must be resolved when installing sysext or building installable mediums.

Decide how to sign the extensions for documentation purposes

For signing the extensions we need a key and cert. I been using the secureboot ones to do it and they work as expected (either PK, KEK or DB work) BUT due to a configuration not enabled in the kernel by default the MOK keys are not available in the kernel keyring, which means if we are using those keys for the sysext signing we need to manually extract them and put them under /run/verity.d which is trivial.

More problematic is if the user decides to sign them with a different key+cert, then we need a way of producing the public cert and storing it somewhere. Probably via config file as usual?

sysext_cert: |
  -----BEGIN CERTIFICATE-----
.....

This is trivial IMO, we should recommend signing this stuff with the DB key in the same process that sings the EFI files, but Im sure somone is gonna ask for this, so we better have an alternative ready. Kind of makes sense to have a different key for this as you can keep your DB key secure, but may need to sign sysexts more frequently (k3s patch versions for example)

We can think about iterating later, let's keep very simple now and document to use the same keys for signing the EFI files.

To me it makes sense to use the same keys used for the UKI in any case - because if the key mismatch the system (UKI) is not going to boot at all , as would mismatch SB signatures.

Final notes

This is pretty much ready for consumption and only minor things around. We should be ok to have it ready for 3.1.x maybe on beta status as we polish it around.

👍

@mudler mudler mentioned this issue Jun 6, 2024
33 tasks
@mauromorales
Copy link
Member

So we would need to copy it to /EFI/kairos/active.efi.extra.d/EXTENSION.raw and /EFI/kairos/passive.efi.extra.d/EXTENSION.raw

shoot and I assume there are no soft links at this level then, but for keeping track of which extensions there are on different images this might actually be good

Amazing work @Itxaka 👏

@Itxaka
Copy link
Member Author

Itxaka commented Jun 6, 2024

So we would need to copy it to /EFI/kairos/active.efi.extra.d/EXTENSION.raw and /EFI/kairos/passive.efi.extra.d/EXTENSION.raw

shoot and I assume there are no soft links at this level then, but for keeping track of which extensions there are on different images this might actually be good

Amazing work @Itxaka 👏

No :(

But at least it gives us some fine grained control for users? Like maybe they want to deploy a new version of X only to active and keep the old one at passive so if it fails they can go back easily?

@Itxaka
Copy link
Member Author

Itxaka commented Jun 6, 2024

this might have sense if we let sysext to overlay directories also outside of /usr

Indeed, but in that case, only confext are supported. We can force other stuff to be loaded but we are skipping some sanity checks so we would need to be careful in there

that's going to be tricky, would be nice to expose for the user that down the line (for instance, picking up extensions from the LiveCD and installing them in the persistent portion which is mapped to /var/lib/extensions

Yes, Im currently testing that because otherwise is boring as f to test this. Currently copying them into the efi partition, but its one toggle away from putting in a different place as part of the install kairos-io/kairos-agent#372

definetly, otherwise an open policy would let anyone to drop a file in there and get loaded and overlaying part of the system (? to be tested ?) - better safe then sorry.

Well, only root can do that (dir permissions+sysext run permissions) so its not a big issue IMO. If you are root you can do anything :D

One of the major problem about this approach is that there is no manual intervention in most cases - so it would be better to be conservative here and do not default to a strict policy. This allows a downstream component (e.g. a provider or a system agent) to take over and decide to recover in other ways automatically without breaking the box. however, this should be configurable during installation via cloud config.

Yes, indeed. For the first versions we could go with default permissive, let users choose enforce. Would be nice if we could hook systemd into this to crash so the boot assesment can kick in maybe? Anyway for future versions.

Let's stick to the simple case here - default is good. If even one of the sysext is broken, we should ignore all of them or fatal ( as in the question earlier above) - it means that either the machine is being compromised, or there is a bug that must be resolved when installing sysext or building installable mediums.

Ok, sounds good. Current test implementation goes by copying one extension, trying to refresh the extensions and if it fails to load, remove the extension and continue with the others.
But yeah, just failing is simpler :D

We can think about iterating later, let's keep very simple now and document to use the same keys for signing the EFI files.

To me it makes sense to use the same keys used for the UKI in any case - because if the key mismatch the system (UKI) is not going to boot at all , as would mismatch SB signatures.

Yes, to me it also makes sense, but if the OS build and sysext build go at different paces (you update OS every 6 months or more, but want to update sysext with patch releases every I dunno, month) then it makes sense to have another key for this. Mainly because its easier to reroll that key, and use a new one. And if it gets exposed becuase you keep taking it out of cold storage to sign stuff, then its a minor issue more or less, you update your public key in the configs and old sysext are invalidated immediately.

Both use cases have pros and cons, we should explain this carefully in the documentation!

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

Let's stick to the simple case here - default is good. If even one of the sysext is broken, we should ignore all of them or fatal ( as in the question earlier above) - it means that either the machine is being compromised, or there is a bug that must be resolved when installing sysext or building installable mediums.

Ok, sounds good. Current test implementation goes by copying one extension, trying to refresh the extensions and if it fails to load, remove the extension and continue with the others. But yeah, just failing is simpler :D

ok, good news here, there is a tool to check if a image is valid before mounting it and covers the policy so we should be ok to do this by:

  • Go over all files to copy
  • systemd-dissect /run/extensions/k3sv1.30.0+k3s1.sysext.raw --validate --image-policy="root=verity+signed+absent:usr=verity+signed+absent"
  • if exit code == 0 copy it, otherwise log an error or warning

Very nice! This would require to install an extra package on ubuntu systemd-containers but it adds about 1Mb so....doable :D plus if there is any issue with loading a raw extension or something it can give us a lot of info about it for troubleshooting 👍

      Name: k3sv1.30.0+k3s1.sysext.raw
      Size: 74.3M
 Sec. Size: 512
     Arch.: x86-64

Image UUID: 6b27d67a-a503-4966-9c6b-f18a88c09da4
 sysext R.: ID=_any
            ARCHITECTURE=x86-64
            EXTENSION_RELOAD_MANAGER=1

    Use As: ✗ bootable system for UEFI
            ✗ bootable system for container
            ✗ portable service
            ✗ initrd
            ✓ sysext for system
            ✓ sysext for portable service
            ✗ sysext for initrd
            ✗ confext for system
            ✗ confext for portable service
            ✗ confext for initrd

RW DESIGNATOR      PARTITION UUID                       PARTITION LABEL        FSTYPE                ARCHITECTURE VERITY GROWFS NODE         PARTNO
ro root            83068f04-8147-d03e-90fb-aef5f4180480 root-x86-64            erofs                 x86-64       signed no     /dev/loop0p1      1
ro root-verity     86c6c45d-4f5e-e63d-06fd-2aedfa7d45bc root-x86-64-verity     DM_verity_hash        x86-64       -      no     /dev/loop0p2      2
ro root-verity-sig ab157e68-35a3-4c71-9a00-51a4a8d54c44 root-x86-64-verity-sig verity_hash_signature x86-64       -      no     /dev/loop0p3      3

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

Tested and works as expected.

  • On install, anything on the install media that matches the *.sysext.raw gets copied into the proper EFI dir for active+passive.
  • On boot, stub will measure those and passs them inot initrd under /.extra/sysext/
  • Immucore will add a timeout and default policy to the sysext service (should be moved to cloud cobfig defaults for uki)
  • Immucore will check file by file and validate against the policy
  • Files that match the policy will get copied under /run/extension
  • default cloud configs already enable systemd-sysext on boot
  • on login, user will have access to whatever there is on the sysext

Files that dont match are ignored and not copied.

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

ok, mostly done, the only thing missing is to add the bits and bobs for further configuration, but I think we could merge this first iteration, start adding end2end tests in kairos to verify it further and then start expanding it with more config

Configs to add:

  • Copy sysexts from /.extra/sysext to persistent instead of to /run/extensions (I still think this is a bad idea, Im looking for a good use case, I cna only see problems with this + a lot of dev time for managing existing sysext, not valid anymore and so on.)
  • Strict VS permissive toggle. Currently we just skip whatever does not match the policy (basically if its not signed or the signature public key is not in the system, we skip it). That is kind of permissive+strict but without breaking the system. A good workflow needs to be find to know what we do here when there is mixed sysext, some of them signed and some dont. We could do same as selinux? Disabled, permissive, enforced? Disabled will just copy all. Permissive will copy all but warn about the ones that not fit the policy. Enforced would just copy the ones that fit the policy and ignore the other ones.
  • How to expose the public key for the extensions verification? Currently we will advise to use the EFI keys to sign as the makes it work out of the box (we extract and convert them on the fly during boot!) but a user could use its own key and we would need to store that key during boot in the /run/verity.d dir so sysext can verify them. Could also be added during install if provided but that would mean is in the immutable rootfs so more difficult to rotate. A simple entry in the config to provide the certificate would be enough.
  • Do we want to expose where to copy it, so it means what .efi file it would affect and be loaded for? Currently we blindy copy them into both active and passive, but I see the potential here to apply some of them only to one or the other. I also see a lot of complex configuration for this.... map of extension regex to entry?
extensions:
  - "k3sv1.29.*":
    - active
    - customEntry
  - "k3sv1.30.*:
    - passive 

Or the other way around:

extensions:
  - active:
    - "k3sv1.29.*"
    - "rke2v1.28.*"
  - passive:
    - "k3sv1.30.*"
    - "rke2v1.29.*"

And anything that doesn't match goes into both? And we support any custom entries there in the efi stuff.

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

One this stuff is merged, Ill open cards for the above things

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

Also, Im gonna try to do a sysext with the contents of /usr/lib/firmware/ and remove those from the rootfs xD
For that we migth need to do an early sysext mount so everything works but lets see with a manual thing if it works first lol. we would get the slim image + full firmware if so, it would be really nice...
I wonder if we could slim it even more .....

@jimmykarily
Copy link
Contributor

I see the examples refer to k3s/rke/etc. Is it possible that we also make the provider-kairos a sysext? I think we-ve discussed it in the past but after doing the research, do you think it would be possible to only produce "core" images and spice them up with providers in the form of sys extensions? This would also allow us to test new provider versions without having to build full images.

If that's possible, we would need to find a way to do the same in the non-UKI case (without the signing and all probably).

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

Oh yes, there is still the question in the air. Currently immucore on the fly sets the sysext timeout/policy on each boot but ideally we could move that to the cloud-config default files for UKI only.
I think it makes sense to move it there.

thoughts @mauromorales @mudler @jimmykarily ?

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

I see the examples refer to k3s/rke/etc. Is it possible that we also make the provider-kairos a sysext? I think we-ve discussed it in the past but after doing the research, do you think it would be possible to only produce "core" images and spice them up with providers in the form of sys extensions? This would also allow us to test new provider versions without having to build full images.

If that's possible, we would need to find a way to do the same in the non-UKI case (without the signing and all probably).

Yes, there should be no issues with the provider or anything else as long as it goes under /usr or /opt
yes, on non-uki it probably works out of the box, as long as we put the sysext in the proper places or make immucore copy them under non-uki as well.
Plus if the policy is only set to uki, then it means we shouldbe able to do the same pretty transparently. BUT we would be dropping alpine for sure as it has no support for it.

I would say that this feature might be uki-exclusive. We dont have the kind of issues in non-uki with sizes that we would need to do this, and we have bundles for extending the system which cover this no?
It would be nice indeed to have a single way of building variations of this but then we have to choose between what we currently have and dropping alpine support :(

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

ah just saw that the provider is currently using /system so.... maybe not...

@Itxaka
Copy link
Member Author

Itxaka commented Jun 7, 2024

maybe we could expand the plugin system to look both on /system/providers and /usr/kairos/providers ? and move everything over there

@Itxaka Itxaka reopened this Jun 7, 2024
@Itxaka Itxaka linked a pull request Jun 7, 2024 that will close this issue
@Itxaka Itxaka reopened this Jun 11, 2024
@Itxaka
Copy link
Member Author

Itxaka commented Jun 11, 2024

After all the work, we hit an issue.

Currently the system works but as soon as sysext finds any extensions it creates an overlayfs on /usr

This means that our current /usr/local with all the state and such gets lost.

Lots of things have been tried but not much seems to work.

One of the things that work is:

  • run systemd-sysext so it creates /usr and below dirs with the sysextensions
  • mount persistent in a different dir, i.e. /run/persistent
  • have persistent data in a dir inside persistent partition(i.e. data/), not just there in the root dir of the partition (important for next step)
  • mount an overlay with lower dir /usr/local, upper dir /run/persistent/data and workdir /run/persistent/work (needs to be on the same mountpoint as upper dir)

That way you get a proper merge between the systemd-sysext AND we can mount our mountpoint in there

sysext on /usr type overlay (ro,nodev,relatime,lowerdir=/run/systemd/sysext/meta/usr:/run/systemd/sysext/extensions/work/usr:/usr,redirect_dir=on,nouserxattr)
/dev/mapper/vda3 on /run/persistent type ext4 (rw,relatime)
overlay on /usr/local type overlay (rw,relatime,lowerdir=/usr/local/,upperdir=/run/persistent/local/,workdir=/run/persistent/work,uuid=on,xino=off,nouserxattr)

This implies changing the actual boot process on immucore to do this instead of the usual direct mount under /usr/local
Alos implies having an upgrade path for this so we cna upgrade from 3.0 to 3.1 (should be simple)

Or we could just go and do our own thing.

  • Docker images
  • Overlay them ourselves
  • Check the signatures (how? which? This are not PE files so we cannot just sign them as is)

And thats it.

@Itxaka
Copy link
Member Author

Itxaka commented Jun 12, 2024

we can fix this by overriding the SYSTEMD_SYSEXT_HIERARCHIES to point to /usr/local :D

Here we can see our existing /usr/local dirs like .state AND the overlayed extension

root@localhost:~# ls /usr/local/
cloud-config  etc  lib  local  work
root@localhost:~# SYSTEMD_SYSEXT_HIERARCHIES=/usr/local systemd-sysext refresh
device-mapper: reload ioctl on c7afa3c7939366ed215f32966b31abcca35a9e5b51d2113400e6914dd4cca11f-verity (252:2) failed: Required key not available
Using extensions 'work'.
Merged extensions into '/usr/local'.
root@localhost:~# ls /usr/local/
bin  cloud-config  etc  lib  local  work
root@localhost:~# ls /usr/local/bin/hello.sh 
/usr/local/bin/hello.sh
root@localhost:~# hello.sh 
Hello world

something to notice here is that the overlayed extension went into the proper dir! so that means that its merging at the /usr/local level
That is good for us as we merge both BUT we will only support things in that case that are under the SYSTEMD_SYSEXT_HIERARCHIES list so we have to be careful and test this.

  • what happens when a sysext deploys to /usr/share for example which has an underlying mount as well?
/dev/mapper/vda3 on /usr/share/pki/trust type ext4 (rw,relatime)
/dev/mapper/vda3 on /usr/share/pki/trust/anchors type ext4 (rw,relatime)
  • what happens with other mounts like /usr/bin? /usr/share?

Still this looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment