|
| 1 | +# Let Multiple Firecracker VMs Share a Root Filesystem with Copy-on-Write |
| 2 | + |
| 3 | +An overlay (copy-on-write) filesystem lets multiple microVMs share a common read-only |
| 4 | +filesystem on the host. Each microVM can still write changes to that filesystem |
| 5 | +by using its own overlay. By default, files are read from the underlying root filesystem. |
| 6 | +All changes are written to the overlay by copying the file and writing the modified |
| 7 | +copy. If such a copy exists on the overlay, it takes precedence over whatever is |
| 8 | +in the root filesystem. |
| 9 | + |
| 10 | +As used by [`firecracker-containerd`](https://github.com/firecracker-microvm/firecracker-containerd), |
| 11 | +this requires a root filesystem in `squashfs` mounted as read-only and a write-layer |
| 12 | +formatted as `ext4`, which can be either a temporary `tempfs` in guest memory or |
| 13 | +a sparse `ext4` file on the host. The latter method has the advantage that changes |
| 14 | +can be persisted across microVM reboots if required. |
| 15 | + |
| 16 | +Please note that this requires changes on the guest and is thus only possible |
| 17 | +if you control the guest's init. |
| 18 | + |
| 19 | +## Convert rootfs to squashfs |
| 20 | + |
| 21 | +If you already have an existing `rootfs` file formatted as `ext4`, e.g., created |
| 22 | +according to the [rootfs-and-kernel-setup](https://github.com/firecracker-microvm/firecracker/blob/main/docs/rootfs-and-kernel-setup.md) |
| 23 | +documentation, you can simply mount it and create a new `squashfs` formatted filesystem |
| 24 | +from that. |
| 25 | + |
| 26 | +This requires `mksquashfs`, which is available as part of the `squashfs-tools` |
| 27 | +for you distribution. |
| 28 | + |
| 29 | +1. Create a mounting point |
| 30 | + |
| 31 | + ```bash |
| 32 | + mkdir /tmp/my-rootfs |
| 33 | + ``` |
| 34 | + |
| 35 | +1. Mount the existing rootfs (e.g., `rootfs.ext4`). If you don't have an existing |
| 36 | + rootfs, you can skip this step and simply copy your files directly. |
| 37 | +
|
| 38 | + ```bash |
| 39 | + sudo mount rootfs.ext4 /tmp/my-rootfs |
| 40 | + ``` |
| 41 | +
|
| 42 | +1. Create necessary folders for mounting the overlay filesystem. These mount points |
| 43 | + have to be created now as the microVM will not be able to change anything on |
| 44 | + the read-only filesystem. |
| 45 | +
|
| 46 | + ```bash |
| 47 | + sudo mkdir -p /tmp/my-rootfs/overlay/root \ |
| 48 | + /tmp/my-rootfs/overlay/work \ |
| 49 | + /tmp/my-rootfs/mnt \ |
| 50 | + /tmp/my-rootfs/rom |
| 51 | + ``` |
| 52 | +
|
| 53 | +1. Create the `overlay-init` script (adapted from [overlay-init of firecracker-containerd](https://github.com/firecracker-microvm/firecracker-containerd/blob/main/tools/image-builder/files_debootstrap/sbin/overlay-init)). |
| 54 | +
|
| 55 | + ```bash |
| 56 | + cat > overlay-init <<EOF |
| 57 | + #!/bin/sh |
| 58 | + # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved |
| 59 | + # |
| 60 | + # Licensed under the Apache License, Version 2.0 (the "License"). You may |
| 61 | + # not use this file except in compliance with the License. A copy of the |
| 62 | + # License is located at |
| 63 | + # |
| 64 | + # <http://aws.amazon.com/apache2.0/> |
| 65 | + # |
| 66 | + # or in the "license" file accompanying this file. This file is distributed |
| 67 | + # on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either |
| 68 | + # express or implied. See the License for the specific language governing |
| 69 | + # permissions and limitations under the License |
| 70 | +
|
| 71 | + # Parameters |
| 72 | + # 1. rw_root -- path where the read/write root is mounted |
| 73 | + # 2. work_dir -- path to the overlay workdir (must be on same filesystem as rw_root) |
| 74 | + # Overlay will be set up on /mnt, original root on /mnt/rom |
| 75 | + pivot() { |
| 76 | + local rw_root work_dir |
| 77 | + rw_root="$1" |
| 78 | + work_dir="$2" |
| 79 | + /bin/mount \ |
| 80 | + -o noatime,lowerdir=/,upperdir=${rw_root},workdir=${work_dir} \ |
| 81 | + -t overlay "overlayfs:${rw_root}" /mnt |
| 82 | + pivot_root /mnt /mnt/rom |
| 83 | + } |
| 84 | +
|
| 85 | + # Overlay is configured under /overlay |
| 86 | + # Global variable $overlay_root is expected to be set to either |
| 87 | + # "ram", which configures a tmpfs as the rw overlay layer (this is |
| 88 | + # the default, if the variable is unset) |
| 89 | + # - or - |
| 90 | + # A block device name, relative to /dev, in which case it is assumed |
| 91 | + # to contain an ext4 filesystem suitable for use as a rw overlay |
| 92 | + # layer. e.g. "vdb" |
| 93 | + do_overlay() { |
| 94 | + local overlay_dir="/overlay" |
| 95 | + if [ "$overlay_root" = ram ] || |
| 96 | + [ -z "$overlay_root" ]; then |
| 97 | + /bin/mount -t tmpfs -o noatime,mode=0755 tmpfs /overlay |
| 98 | + else |
| 99 | + /bin/mount -t ext4 "/dev/$overlay_root" /overlay |
| 100 | + fi |
| 101 | + mkdir -p /overlay/root /overlay/work |
| 102 | + pivot /overlay/root /overlay/work |
| 103 | + } |
| 104 | +
|
| 105 | + # If we're given an overlay, ensure that it really exists. Panic if not |
| 106 | + if [ -n "$overlay_root" ] && |
| 107 | + [ "$overlay_root" != ram ] && |
| 108 | + [ ! -b "/dev/$overlay_root" ]; then |
| 109 | + echo -n "FATAL: " |
| 110 | + echo -n "Overlay root given as $overlay_root but " |
| 111 | + echo "/dev/$overlay_root does not exist" |
| 112 | + exit 1 |
| 113 | + fi |
| 114 | + |
| 115 | + do_overlay |
| 116 | + |
| 117 | + # invoke the actual system init program and procede with the boot |
| 118 | + # process |
| 119 | + exec /sbin/init $@ |
| 120 | + EOF |
| 121 | + |
| 122 | + sudo cp overlay-init /tmp/my-rootfs/sbin/overlay-init |
| 123 | + ``` |
| 124 | +
|
| 125 | +1. Create a `squashfs` formatted filesystem |
| 126 | +
|
| 127 | + ```bash |
| 128 | + sudo mksquashfs /tmp/my-rootfs rootfs.img -noappend |
| 129 | + ``` |
| 130 | +
|
| 131 | +1. Unmount the old rootfs (if mounted in step 2). |
| 132 | +
|
| 133 | + ```bash |
| 134 | + sudo umount /tmp/my-rootfs |
| 135 | + ``` |
| 136 | +
|
| 137 | +Now we have successfully prepared the rootfs. |
| 138 | +
|
| 139 | +## Creating an ext4 Formatted Persistent Overlay |
| 140 | +
|
| 141 | +To allow microVMs to save persistent files that are available after a reboot, we |
| 142 | +need to create an `ext4` image to use as an overlay. If data does not need to be |
| 143 | +available again after a reboot, you can skip this step, as it is possible to use |
| 144 | +an in-memory `tmpfs` as an overlay instead. |
| 145 | +
|
| 146 | +1. Create the image file. We will use a size of 1 GiB (1024 MiB), but this can |
| 147 | + be increased. |
| 148 | +
|
| 149 | + ```bash |
| 150 | + dd if=/dev/zero of=overlay.ext4 conv=sparse bs=1M count=1024 |
| 151 | + ``` |
| 152 | +
|
| 153 | + The file will be created as a sparse file, so that it only uses as much disk |
| 154 | + space as it currently needs. The file size may still be reported as 1 GiB |
| 155 | + (the file's _apparent size_). Note that this requires your host filesystem |
| 156 | + to support sparse files. Its actual size can be checked with the following |
| 157 | + command (which should be 0 right now): |
| 158 | +
|
| 159 | + ```bash |
| 160 | + du -h overlay.ext4 |
| 161 | + ``` |
| 162 | +
|
| 163 | + `du` can also be used to report the apparent size of a file (1GiB in this |
| 164 | + example): |
| 165 | +
|
| 166 | + ```bash |
| 167 | + du -h --apparent-size overlay.ext4 |
| 168 | + ``` |
| 169 | +
|
| 170 | +1. Create an `ext4` file system on the image file. |
| 171 | +
|
| 172 | + ```bash |
| 173 | + mkfs.ext4 overlay.ext4 |
| 174 | + ``` |
| 175 | +
|
| 176 | +Done! The overlay is ready now. Note that you need to create **one filesystem per |
| 177 | +microVM**. |
| 178 | +
|
| 179 | +## Configure the rootfs and Kernel Boot Parameters |
| 180 | +
|
| 181 | +To actually use the overlay filesystem correctly, you will need to adapt your Firecracker |
| 182 | +configuration and boot parameters for you microVMs. |
| 183 | +
|
| 184 | +First, mount the new `squashfs` root filesystem as read-only. Note that this step |
| 185 | +is optional but recommended. Simply set the `is_read_only` parameter in your Firecracker |
| 186 | +disk parameters to `true` for the root device. |
| 187 | +
|
| 188 | +Second, set the `init` parameter to `/sbin/overlay-init` to execute the initalization |
| 189 | +of our overlay filesystem before starting the rest of the microVM's init process. |
| 190 | +If you set the `overlay_root` to `ram` or leave it unset, a `tmpfs` will be created |
| 191 | +and used as the write layer. Otherwise, add the `overlay.ext4` as a second drive |
| 192 | +and set `overlay_root` to `vdb` (or mount it as a third drive and set to `vdc`, etc.). |
| 193 | +
|
| 194 | +```json |
| 195 | +{ |
| 196 | + "boot-source": { |
| 197 | + "kernel_image_path": "vmlinux", |
| 198 | + "boot_args": "console=ttyS0 reboot=k panic=1 pci=off overlay_root=vdb init=/sbin/overlay-init", |
| 199 | + }, |
| 200 | + "drives": [ |
| 201 | + { |
| 202 | + "drive_id": "rootfs", |
| 203 | + "path_on_host": "rootfs.img", |
| 204 | + "is_root_device": true, |
| 205 | + "is_read_only": true, |
| 206 | + }, |
| 207 | + { |
| 208 | + "drive_id": "overlayfs", |
| 209 | + "path_on_host": "overlay.ext4", |
| 210 | + "is_root_device": false, |
| 211 | + } |
| 212 | + ], |
| 213 | + "machine-config": { |
| 214 | + "vcpu_count": 2, |
| 215 | + "mem_size_mib": 1024, |
| 216 | + }, |
| 217 | +} |
| 218 | +``` |
0 commit comments