Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancements: Propose CoreOS layering #7

Merged
merged 5 commits into from
Nov 22, 2021
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions os/coreos-layering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# CoreOS Layering

This enhancement proposes:

- A fundamental new addition to ostree/rpm-ostree, which is support for directly pulling and updating the OS from container images (while keeping all existing functionality, per-node package layering and our existing support for pulling via ostree on the wire).
- Documentation for generating derived (layered) images from the pristine CoreOS base image.
- Support for switching a FCOS system to use a custom image on firstboot via Ignition
- zincati will continue to perform upgrades by inspecting the upgrade graph from the base image.

# Existing work

Originally, https://github.com/coreos/fedora-coreos-tracker/issues/812 tracked native support for "encapsulating" ostree commits in containers.

Then, it was realized that when shipping the OS as a container image, it feels natural to support users deriving from it. The bulk of this is really "ostree native container" integration glue, and is happening in https://github.com/ostreedev/ostree-rs-ext

rpm-ostree vendors the ostree-rs-ext code and also be extended to support the same interfaces as implemented by the "base" ostree-rs-ext code.

Specifically as of today, this functionality is exposed in:

- `rpm-ostree ex-container`
- `rpm-ostree rebase --experimental $containerref`

# Rationale

Since the creation of Container Linux (CoreOS) as well as Atomic Host, and continuing into the Fedora/RHEL CoreOS days, we have faced a constant tension around what we ship in the host system.

[This issue](https://github.com/coreos/fedora-coreos-tracker/issues/401) encapsulates much prior discussion.

For users who are happy with Fedora CoreOS today, not much will change.

For those who e.g. want to install custom agents or nontrivial amounts of code (such as kubelet), this "CoreOS layering" will be a powerful new mechanism to ship the code they need.

# Example via Dockerfile

[fcos-derivation-example](https://github.com/cgwalters/fcos-derivation-example) contains an example `Dockerfile` that builds a Go binary and injects it along with a corresponding systemd unit as a layer, building on top of Fedora CoreOS.

For ease of reference, a copy of the above is inline here:

```dockerfile
# Build a small Go program using a builder image
FROM registry.access.redhat.com/ubi8/ubi:latest as builder
WORKDIR /build
COPY . .
RUN yum -y install go-toolset
RUN go build hello-world.go

# In the future, this would be e.g. quay.io/coreos/fedora:stable
FROM quay.io/cgwalters/fcos-dev
# Inject it into Fedora CoreOS
COPY --from=builder /build/hello-world /usr/bin
# And add our unit file
ADD hello-world.service /etc/systemd/system/hello-world.service
# Also add strace; we don't yet support `yum install` but we can
# with some work in rpm-ostree!
RUN rpm -Uvh https://kojipkgs.fedoraproject.org//packages/strace/5.14/1.fc34/x86_64/strace-5.14-1.fc34.x86_64.rpm
```

cgwalters marked this conversation as resolved.
Show resolved Hide resolved
# Derivation versus Ignition/Butane

This proposal does not replace Ignition. Ignition will still play at least two key roles:

- Setting up partitions and storage, e.g. LUKS is something configured via Ignition provided on boot.
- Machine/node specific configuration, in particular bootstrap configuration: e.g. static IP addresses that are necessary to fetch container images at all.

# Butane as a declarative input format for layering

We must support `Dockerfile`, because it's the lowest common denominator for the container ecosystem, and is accepted as input for many tools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if there was a halfway approach where we do support the Dockerfile approach but the only supported operation is still to just apply a Butane config, potentially with some locally referenced files that were copied from a previous build stage.

Personally, my primary concerns with the "freehand" Dockerfile approach is that:

  1. It's too easy to make drastic changes. E.g. in rpm-ostree, we have this conscious decision to separate out pure package layering and base package overrides. This distinction does not exist here, and removing these guardrails means it becomes much easier for users to shoot themselves in the foot. I guess one thing we could do is have rpm-ostree detect when content was modified instead of purely added and require a --allow-base-overrides or something?
  2. It makes the configuration story messier. With this, you can now configure at both image build time and Ignition boot time. And they each use completely different languages. So I think there's potential there to cause user confusion. It also impacts support, which now needs to understand more types of inputs.

Focusing on Butane configs as the layering mechanism in my opinion helps with both of those because it's more declarative and so easier to analyze, and it's the same configuration language we've been speaking so far.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm OK with this too, and I know we can always tweak things going forward (though it's always harder to add restrictions than it is to remove them). Just wanted to point out what IMO are some non-negligible risks of this approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this conscious decision to separate out pure package layering and base package overrides. This distinction does not exist here,

That's not quite true, because we will still have the original base image. Reverting back to the "golden FCOS base" will also be a just an rpm-ostree rebase away.

Additionally, because we will have yum|dnf -> rpm-ostree in this image (xref https://coreos.github.io/rpm-ostree/cliwrap/) we can actually still impose the install vs override semantic.

Further, tooling can perform filesystem and rpmdb diffing between layers - we have all that code today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And they each use completely different languages.

Also, today one can use any tool to generate Ignition, which definitely happens (e.g. openshift-install generates Ignition via some custom Terraform stuff) and we support that.

More broadly I think we are in a position of needing to support "low level" as well as arbitrary configuration mechanisms, but we also try to have opinions on higher level tooling.

So here Dockerfile is that lowest common denominator for arbitrary, low level (kinda) mechanism - but we can streamline higher level workflows inside of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this conscious decision to separate out pure package layering and base package overrides. This distinction does not exist here,

That's not quite true, because we will still have the original base image. Reverting back to the "golden FCOS base" will also be a just an rpm-ostree rebase away.

Right, but the power a user wields in their Dockerfile sees it all as just one filesystem. E.g. nothing stops them from doing COPY my-libc.so /usr/lib64/libc.so and the build system will be happy to build that. Whereas if it's Butane based, we can immediately tell they're modifying base content and fail the build. So the failure happens at image build time instead of deploy and reboot time. Of course, we do want to provide flexibility to modify base content, but IMO it should be explicitly opted in.

Combining this with ostreedev/ostree-rs-ext#159, maybe we could have this ostree container finalize step take that switch? E.g. ostree container finalize --allow-overrides.

Additionally, because we will have yum|dnf -> rpm-ostree in this image (xref coreos.github.io/rpm-ostree/cliwrap) we can actually still impose the install vs override semantic.

Hmm, can you expand on the UX you're thinking of here? dnf install will automatically upgrade an already installed package so we'd have to require some added switch (related: coreos/rpm-ostree#2844 (comment)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you expand on the UX you're thinking of here? dnf install will automatically upgrade an already installed package so we'd have to require some added switch (related: coreos/rpm-ostree#2844 (comment)).

Well, the cliwrap status quo today errors actually when someone types dnf install. I know this is a perpetually confusing thing but so far cliwrap does not involve necessarily invoking traditional yum|dnf logic. It opens the door to that of course.

So specifically if e.g. someone times RUN yum install usbguard in their Dockerfile, we would error out, and they'd have to do RUN rpm-ostree install usbguard which would run through all of the same logic that exists today which would only layer, not upgrade dependencies. (There's...a lot implied in this whole "rpm-ostree in container builds" flow, really worth splitting out to a separate issue; will look at that)

Copy link
Member Author

@cgwalters cgwalters Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real status quo right now to be clear is that dnf install usbguard will give "command not found" and rpm-ostree install usbguard will error out because you're in a container. And rpm -Uvh https://example.com/usbguard.rpm will work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> podman run --rm -ti quay.io/cgwalters/fcos bash
bash-5.1# dnf install usbguard
bash: dnf: command not found
bash-5.1# rpm-ostree install usbguard
error: This system was not booted via libostree; found $container=podman environment variable.
rpm-ostree is designed to manage host systems, not containers.

bash-5.1# 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So specifically if e.g. someone times RUN yum install usbguard in their Dockerfile, we would error out, and they'd have to do RUN rpm-ostree install usbguard which would run through all of the same logic that exists today which would only layer, not upgrade dependencies.

Ack, I like that.

What I'm trying to make sure here is that OCP customers who will be interacting with this feature are still subject to basic rules by default so that our QA efforts aren't trivially invalidated. I.e. I don't want them to be able to change /usr base content just as easily as they can /etc content. If the only way OCP customers interact with this is through Butane or MCs for example, then I think that's fine. Otherwise, I think we should make them opt into modifying base content explicitly (and maybe e.g. we only support them doing this if they have a support exception).

Taking this feature in isolation (e.g. for FCOS users), typing rpm-ostree rebase $my_pullspec already implies some risk acceptance (though there's still value in keeping guardrails there too of course). But in the OCP case, the MCO will be what's running that command.


However, one does not have to use `Dockerfile` to make containers. Specifically, what would make a lot of sense for Fedora CoreOS is to focus
on Butane as a standard declarative interface to this process.

This could run as a "builder" container, something like this:

```
FROM quay.io/coreos/butane:release
COPY . .
RUN butane -o /build/ignition.json

FROM quay.io/fedora/coreos:stable
COPY --from=builder /build/ignition.json /tmp/
RUN ignition --write-filesystem /tmp/ignition.json && rm -f /tmp/ignition.json
```

Another option is to support being run nested inside an existing container tool, similar to
[kaniko](https://github.com/GoogleContainerTools/kaniko). Then no
`Dockerfile` would be needed.

# Use of CoreOS disk/boot images

And more explicitly, it's expected that many if not most users would continue to use the official Fedora CoreOS "boot images" (e.g. ISO, AMI, qcow2, etc.). This proposal does *not* currently call for exposing a way for a user to create the boot image shell around their custom container, although that is an obvious potential next step.

Hence, a user wanting to use a custom base image would provide machines with an Ignition config that performs e.g. `rpm-ostree rebase ostree-remote-image:quay.io/examplecorp/baseos:latest` as a systemd unit. It is likely that we would provide this via [Butane](github.com/coreos/butane) as well; for example:

```
variant: fcos
version: 1.5.0
ostree_container:
image: quay.io/mycorp/myfcos:stable
reboot: true
```