Skip to content

Add example for Kubernetes #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions Dockerfile.k8s
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# syntax=docker/dockerfile:1.3-labs

FROM example-spring-boot-checkpoint
RUN apt-get update && apt-get install -y ncat
ENV CRAC_FILES_DIR=/cr

# This script is going to be used in the checkpointing job
COPY <<'EOF' /checkpoint.sh
#!/bin/sh

mkdir -p $CRAC_FILES_DIR
rm $CRAC_FILES_DIR/* || true

# After receiving connection on port 1111 trigger the checkpoint (using numeric address to avoid IPv6 problems)
(nc -v -l -p 1111 && jcmd example-spring-boot.jar JDK.checkpoint) &
# we cannot exec java ... because the pod would be marked as failed when it exits
# with exit code 137 after checkpoint
java -XX:CRaCCheckpointTo=$CRAC_FILES_DIR -XX:CRaCMinPid=128 -jar /example-spring-boot.jar &
PID=$!
trap "kill $PID" SIGINT SIGTERM
wait $PID || true
EOF

COPY <<'EOF' /restore-or-start.sh
#!/bin/sh

if [ -z "$(ls -A $CRAC_FILES_DIR 2> /dev/null)" ]; then
echo "No checkpoint found, starting the application normally..."
exec java -jar /example-spring-boot.jar
else
echo "Checkpoint is present, restoring the application..."
exec java -XX:CRaCRestoreFrom=$CRAC_FILES_DIR
fi
EOF

ENTRYPOINT [ "bash" ]
CMD [ "/restore-or-start.sh" ]
74 changes: 74 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,3 +135,77 @@ export URL=$(gcloud run services describe example-spring-boot-direct --format 'v
curl $URL
Greetings from Spring Boot!
```

## Preparing checkpoint and running in Kubernetes

One way to run in Kubernetes is to perform the checkpoint locally or as part of Docker build, as we have done in the previous examples. Here we will show you how to do it end-to-end inside Kubernetes.

Let's begin by starting a new Minikube cluster. We will create a new namespace `example` and use this for the demo:

```bash
minikube start
eval $(minikube docker-env)
kubectl create ns example
kubectl config set-context --current --namespace=example
```

Now we can build an image using `Dockerfile.k8s`, based on `example-spring-boot-checkpoint` - that image hosts a built application. We will add the `netcat` utility and two scripts:
* `checkpoint.sh` starts the application with `-XX:CRaCCheckpointTo=...` and `netcat` server listening on port 1111. When somebody connects to this port, the checkpoint via `jcmd` will be triggered.
* `restore-or-start.sh` will check the presence of checkpoint image files and either restores from this image, or fallbacks to a regular application startup.

```bash
docker build -f Dockerfile.checkpoint -t example-spring-boot-checkpoint .
docker build -f Dockerfile.k8s -t example-spring-boot-k8s .
```

Now we can apply resources from `k8s.yaml`: this hosts a PersistentVolumeClaim representing a storage (in Minikube this is bound automatically to a PersistentVolume), a Deployment that will create the application using the `restore-or-start.sh` script, and a Job that will create the checkpoint image. You can apply that now and observe that this has created two pods:

```bash
kubectl apply -f k8s.yaml
kubectl get po
```
```
NAME READY STATUS RESTARTS AGE
create-checkpoint-fsfs4 2/2 Running 0 4s
example-spring-boot-68b69cc8-bbxnx 1/1 Running 0 4s
```

When you explore application logs (`kubectl logs example-spring-boot-68b69cc8-bbxnx`) you will find that the application is started normally; the checkpoint image was not created yet. The other pod, though, hosts two containers: one running `checkpoint.sh` and the other warming the application up using `siege`, and then triggering the checkpoint through connection on port 1111 (this is not a built-in feature, remember that we use `netcat` in the background).

After a while the job completes:

```bash
kubectl get job
NAME STATUS COMPLETIONS DURATION AGE
create-checkpoint Complete 1/1 19s 44m
```

And now you can rollout a new deployment, this time restoring the application from the checkpoint image:

```bash
kubectl rollout restart deployment/example-spring-boot
```

After a short moment that application is back up:

```
NAME READY STATUS RESTARTS AGE
create-checkpoint-fsfs4 0/2 Completed 0 95s
example-spring-boot-79b98966db-ml2pj 1/1 Running 0 15s
```

In the logs you can see that it performed the restore:

```
2024-09-30T07:52:11.858Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2024-09-30T07:52:11.866Z INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
2024-09-30T07:52:11.868Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed (restored JVM running for 45 ms)
```

At last, let's verify that the application responds to our requests. You should get the "Greetings from Spring Boot!" reply:

```bash
kubectl expose deployment example-spring-boot --type=NodePort --port=8080
URL=$(minikube service example-spring-boot -n example --url)
curl $URL
```
88 changes: 88 additions & 0 deletions k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: crac-image
namespace: example
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
storageClassName: "standard"
---
apiVersion: batch/v1
kind: Job
metadata:
name: create-checkpoint
namespace: example
spec:
template:
spec:
containers:
- name: workload
image: example-spring-boot-k8s
imagePullPolicy: IfNotPresent
env:
- name: CRAC_FILES_DIR
value: /var/crac/image
args:
- /checkpoint.sh
securityContext:
capabilities:
add:
- CHECKPOINT_RESTORE
- SYS_PTRACE
volumeMounts:
- mountPath: /var/crac
name: crac-image
- name: warmup
image: jstarcher/siege
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
while ! nc -z localhost 8080; do sleep 0.1; done
siege -c 1 -r 100000 -b http://localhost:8080
echo "Do checkpoint, please" | nc -v localhost 1111
restartPolicy: Never
volumes:
- name: crac-image
persistentVolumeClaim:
claimName: crac-image
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-spring-boot
namespace: example
labels:
app: example-spring-boot
spec:
replicas: 1
selector:
matchLabels:
app: example-spring-boot
template:
metadata:
labels:
app: example-spring-boot
spec:
containers:
- name: workload
image: example-spring-boot-k8s
imagePullPolicy: IfNotPresent
env:
- name: CRAC_FILES_DIR
value: /var/crac/image
ports:
- containerPort: 8080
volumeMounts:
- mountPath: /var/crac
name: crac-image
volumes:
- name: crac-image
persistentVolumeClaim:
claimName: crac-image
readOnly: true