Skip to content

Commit

Permalink
Support container restore through CRI/Kubernetes
Browse files Browse the repository at this point in the history
This implements container restore as described in:

https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone

For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh

The code changes are based on changes I have done in Podman around 2018
and CRI-O around 2020.

The history behind restoring container via CRI/Kubernetes probably
requires some explanation. The initial proposal to bring
checkpoint/restore to Kubernetes was looking at pod checkpoint and
restoring and the corresponding CRI changes.

kubernetes-sigs/cri-tools#662
kubernetes/kubernetes#97194

After discussing this topic for about two years another approach was
implemented as described in KEP-2008:

kubernetes/enhancements#2008

"Forensic Container Checkpointing" allowed us to separate checkpointing
from restoring. For the "Forensic Container Checkpointing" it is enough
to create a checkpoint of the container. Restoring is not necessary as
the analysis of the checkpoint archive can happen without restoring the
container.

While thinking about a way to restore a container it was by coincidence
that we started to look into restoring containers in Kubernetes via
Create and Start. The way it was done in CRI-O is to figure out during
Create if the container image is a checkpoint image and if that is true
we are using another code path. The same was implemented now with this
change in containerd.

With this change it is possible to restore the container from a
checkpoint tar archive that is created during checkpointing via CRI.

To restore a container via Kubernetes we convert the tar archive to an
OCI image as described in the kubernetes.io blog post from above. Using
this OCI image it is possible to restore a container in Kubernetes.

At this point I think it should be doable to restore containers in
CRI-O and containerd no matter if they have been created by containerd or
CRI-O. The biggest difference is the container metadata and that can
be adapted during restore.

Open items:

 * It is not clear to me why restoring a container in containerd goes
   through task/Create(). But as the restore code already exists this
   change extended the existing code path to restore a container in
   task/Create() to also restore a container through the CRI via
   Create and Start.
 * Automatic image pulling. containerd does not pull images
   automatically if created via the CRI. There is an option in
   crictl to pull images before starting, but that uses the CRI
   image pull interface. It is still a separate pull and create
   operation. Restoring containers from an OCI image is a bit
   different. The checkpoint OCI image does not include the base
   image, but just a reference to the image (NAME@DIGEST).
   Using crictl with pulling will enable the pulling of the
   checkpoint image, but not of the base image the checkpoint is
   based on. So during preparation of the checkpoint containerd
   will automatically pull the base image, but I was not able how
   to pull an image blockingly in containerd. So there is a for
   loop waiting for the container image to appear in the internal
   store. I think this probably can be implemented better.

Anyway, this is a first step towards container restored in Kubernetes
when using containerd.

Signed-off-by: Adrian Reber <areber@redhat.com>
  • Loading branch information
adrianreber committed Jul 23, 2024
1 parent 323ba43 commit f7a1f41
Show file tree
Hide file tree
Showing 29 changed files with 2,043 additions and 172 deletions.
1 change: 1 addition & 0 deletions api/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ require (
golang.org/x/net v0.23.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/text v0.14.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
4 changes: 2 additions & 2 deletions api/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -76,5 +76,5 @@ google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGm
google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b h1:h8qDotaEPuJATrMmW04NCwg7v22aHH28wwpauUhK9Oo=
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
127 changes: 95 additions & 32 deletions client/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ type Container interface {
Update(context.Context, ...UpdateContainerOpts) error
// Checkpoint creates a checkpoint image of the current container
Checkpoint(context.Context, string, ...CheckpointOpts) (Image, error)
// Restore restores a container and returns the PID of the
// restored containers init process.
Restore(context.Context, cio.Creator, string) (int, error)
}

func containerFromRecord(client *Client, c containers.Container) *container {
Expand Down Expand Up @@ -229,41 +232,13 @@ func (c *container) NewTask(ctx context.Context, ioCreate cio.Creator, opts ...N
Stdout: cfg.Stdout,
Stderr: cfg.Stderr,
}
r, err := c.get(ctx)
if err != nil {
if err := c.handleMounts(ctx, request); err != nil {
return nil, err
}
if r.SnapshotKey != "" {
if r.Snapshotter == "" {
return nil, fmt.Errorf("unable to resolve rootfs mounts without snapshotter on container: %w", errdefs.ErrInvalidArgument)
}

// get the rootfs from the snapshotter and add it to the request
s, err := c.client.getSnapshotter(ctx, r.Snapshotter)
if err != nil {
return nil, err
}
mounts, err := s.Mounts(ctx, r.SnapshotKey)
if err != nil {
return nil, err
}
spec, err := c.Spec(ctx)
if err != nil {
return nil, err
}
for _, m := range mounts {
if spec.Linux != nil && spec.Linux.MountLabel != "" {
if ml := label.FormatMountLabel("", spec.Linux.MountLabel); ml != "" {
m.Options = append(m.Options, ml)
}
}
request.Rootfs = append(request.Rootfs, &types.Mount{
Type: m.Type,
Source: m.Source,
Target: m.Target,
Options: m.Options,
})
}
r, err := c.get(ctx)
if err != nil {
return nil, err
}
info := TaskInfo{
runtime: r.Runtime.Name,
Expand Down Expand Up @@ -323,6 +298,94 @@ func (c *container) Update(ctx context.Context, opts ...UpdateContainerOpts) err
return nil
}

func (c *container) handleMounts(ctx context.Context, request *tasks.CreateTaskRequest) error {
r, err := c.get(ctx)
if err != nil {
return err
}

if r.SnapshotKey != "" {
if r.Snapshotter == "" {
return fmt.Errorf("unable to resolve rootfs mounts without snapshotter on container: %w", errdefs.ErrInvalidArgument)
}

// get the rootfs from the snapshotter and add it to the request
s, err := c.client.getSnapshotter(ctx, r.Snapshotter)
if err != nil {
return err
}
mounts, err := s.Mounts(ctx, r.SnapshotKey)
if err != nil {
return err
}
spec, err := c.Spec(ctx)
if err != nil {
return err
}
for _, m := range mounts {
if spec.Linux != nil && spec.Linux.MountLabel != "" {
if ml := label.FormatMountLabel("", spec.Linux.MountLabel); ml != "" {
m.Options = append(m.Options, ml)
}
}
request.Rootfs = append(request.Rootfs, &types.Mount{
Type: m.Type,
Source: m.Source,
Target: m.Target,
Options: m.Options,
})
}
}

return nil
}

func (c *container) Restore(ctx context.Context, ioCreate cio.Creator, rootDir string) (int, error) {
errorPid := -1
i, err := ioCreate(c.id)
if err != nil {
return errorPid, err
}
defer func() {
if err != nil && i != nil {
i.Cancel()
i.Close()
}
}()
cfg := i.Config()

request := &tasks.CreateTaskRequest{
ContainerID: c.id,
Terminal: cfg.Terminal,
Stdin: cfg.Stdin,
Stdout: cfg.Stdout,
Stderr: cfg.Stderr,
}

if err := c.handleMounts(ctx, request); err != nil {
return errorPid, err
}

request.Checkpoint = &types.Descriptor{
Annotations: map[string]string{
// The following annotation is used to restore a checkpoint
// via CRI. This is mainly used to restore a container
// in Kubernetes.
"criRestoreFromDirectory": rootDir,
},
}
// (adrianreber): it is not totally clear to me, but it seems the only
// way to restore a container in containerd is going through Create().
// This functions sets up Create() in such a way to handle container
// restore coming through the CRI.
response, err := c.client.TaskService().Create(ctx, request)
if err != nil {
return errorPid, errdefs.FromGRPC(err)
}

return int(response.GetPid()), nil
}

func (c *container) Checkpoint(ctx context.Context, ref string, opts ...CheckpointOpts) (Image, error) {
index := &ocispec.Index{
Versioned: ver.Versioned{
Expand Down
11 changes: 11 additions & 0 deletions contrib/checkpoint/checkcriu.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
package main

import (
criu "github.com/checkpoint-restore/go-criu/v7/utils"
)

func main() {
if err := criu.CheckForCriu(criu.PodCriuVersion); err != nil {
panic(err)
}
}
113 changes: 113 additions & 0 deletions contrib/checkpoint/checkpoint-restore-cri-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#!/usr/bin/env bash

set -eu -o pipefail

DIR=$(dirname "${0}")

cd "${DIR}"
go build -o checkcriu

if ! "./checkcriu"; then
echo "ERROR: CRIU check failed"
exit 1
fi

if [ ! -e "$(command -v crictl)" ]; then
echo "ERROR: crictl binary not found"
exit 1
fi

TESTDIR=$(mktemp -d)

function cleanup() {
rm -f ./checkcriu
rm -rf "${TESTDIR}"
}
trap cleanup EXIT

TESTDATA=testdata
# shellcheck disable=SC2034
CONTAINER_RUNTIME_ENDPOINT="unix:///run/containerd/containerd.sock"

function test_from_archive() {
crictl -t 5s rmp -fa
crictl pull quay.io/crio/fedora-crio-ci:latest
POD_JSON=$(mktemp)
# adapt the log directory
jq ".log_directory=\"${TESTDIR}\"" "$TESTDATA"/sandbox_config.json >"$POD_JSON"
pod_id=$(crictl runp "$POD_JSON")
ctr_id=$(crictl create "$pod_id" "$TESTDATA"/container_sleep.json "$POD_JSON")
crictl start "$ctr_id"
lines_before=$(crictl logs "$ctr_id" | wc -l)
# changes file system to see if changes are included in the checkpoint
crictl exec "$ctr_id" touch /etc/testfile
crictl exec "$ctr_id" rm /etc/exports
crictl -t 10s checkpoint --export="$TESTDIR"/cp.tar "$ctr_id"
crictl rm -f "$ctr_id"
crictl rmp -f "$pod_id"
crictl rmi quay.io/crio/fedora-crio-ci:latest
crictl images
pod_id=$(crictl runp "$POD_JSON")
# Replace original container with checkpoint image
RESTORE_JSON=$(mktemp)
jq ".image.image=\"$TESTDIR/cp.tar\"" "$TESTDATA"/container_sleep.json >"$RESTORE_JSON"
ctr_id=$(crictl create "$pod_id" "$RESTORE_JSON" "$POD_JSON")
rm -f "$RESTORE_JSON" "$POD_JSON"
crictl start "$ctr_id"
sleep 1
lines_after=$(crictl logs "$ctr_id" | wc -l)
if [ "$lines_before" -ge "$lines_after" ]; then
echo "number of lines after checkpointing ($lines_after) " \
"should be larger than before checkpointing ($lines_before)"
false
fi
# Cleanup
crictl rmi quay.io/crio/fedora-crio-ci:latest
crictl exec "$ctr_id" ls -la /etc/testfile
if crictl exec "$ctr_id" ls -la /etc/exports >/dev/null 2>&1; then
echo "error: file /etc/exports should not exist but it does"
exit 1
fi
}

function test_from_oci() {
crictl -t 5s rmp -fa
crictl pull quay.io/crio/fedora-crio-ci:latest
pod_id=$(crictl runp "$TESTDATA"/sandbox_config.json)
ctr_id=$(crictl create "$pod_id" "$TESTDATA"/container_sleep.json "$TESTDATA"/sandbox_config.json)
crictl start "$ctr_id"
crictl -t 10s checkpoint --export="$TESTDIR"/cp.tar "$ctr_id"
crictl rm -f "$ctr_id"
crictl rmp -f "$pod_id"
crictl rmi quay.io/crio/fedora-crio-ci:latest
crictl images
# Change cgroup of new sandbox
RESTORE_POD_JSON=$(mktemp)
jq ".linux.cgroup_parent=\"different_cgroup_789\"" "$TESTDATA"/sandbox_config.json >"$RESTORE_POD_JSON"
pod_id=$(crictl runp "$RESTORE_POD_JSON")
# Replace original container with checkpoint image
RESTORE_JSON=$(mktemp)
# Convert tar checkpoint archive to OCI image
newcontainer=$(buildah from scratch)
buildah add "$newcontainer" "$TESTDIR"/cp.tar /
buildah config --annotation=org.criu.checkpoint.container.name=test "$newcontainer"
buildah commit "$newcontainer" checkpoint-image:latest
buildah rm "$newcontainer"
# Export OCI image to disk
podman image save --format oci-archive -o "$TESTDIR"/oci.tar localhost/checkpoint-image:latest
buildah rmi localhost/checkpoint-image:latest
# Remove potentially old version of the checkpoint image
../../bin/ctr -n k8s.io images rm localhost/checkpoint-image:latest
# Import image
../../bin/ctr -n k8s.io images import "$TESTDIR"/oci.tar
jq ".image.image=\"localhost/checkpoint-image:latest\"" "$TESTDATA"/container_sleep.json >"$RESTORE_JSON"
ctr_id=$(crictl create "$pod_id" "$RESTORE_JSON" "$RESTORE_POD_JSON")
rm -f "$RESTORE_JSON" "$RESTORE_POD_JSON"
crictl start "$ctr_id"
# Cleanup
../../bin/ctr -n k8s.io images rm localhost/checkpoint-image:latest
crictl rmi quay.io/crio/fedora-crio-ci:latest
}

test_from_archive
test_from_oci
Loading

0 comments on commit f7a1f41

Please sign in to comment.