Skip to content

Runner pod ephemerality with emptyDir #481

Open
@joshrichards37

Description

@joshrichards37

Hi there,

I am in the process of implementing the operator in our k8s cluster, and everything has been great and straight forward so far.

I just have a question around ephemerality of the pods. I have tried using the myoung34 derivate of the container image and passing the EPHEMERAL env var through, and this does seem to restart the runner container which is great however it does not restart the pod, which means the emptyDir volumes don't get recreated and persist on the cluster node.

Using the myoung34 derivate also doesn't seem to work with the runner reconciliation meaning that the autoscaling isn't working for me right now using the derivate, here are some logs when using the derivate:

2022-08-31T11:50:17.994Z	INFO	controllers.GithubActionRunner	Registration token expired, updating	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:18.236Z	INFO	controllers.GithubActionRunner	Unregistering runner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox", "name": "runner-poolsandbox-pod-hh2bv", "id": 9895}
2022-08-31T11:50:18.613Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:18.843Z	INFO	controllers.GithubActionRunner	Scaling up	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox", "numInstances": 1}
2022-08-31T11:50:18.868Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox", "Pod.Namespace": "github-actions-runner-operator", "Pod.Name": "runner-poolsandbox-pod-9ll4b", "result": "created"}
2022-08-31T11:50:18.869Z	DEBUG	events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-actions-runner-operator","name":"runner-poolsandbox","uid":"0f373e1e-2712-45ea-9a1a-d7dc974533f7","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"1775143"}, "reason": "Scaling", "message": "Created pod github-actions-runner-operator/runner-poolsandbox-pod-9ll4b"}
2022-08-31T11:50:18.876Z	DEBUG	events	Warning	{"object": {"kind":"GithubActionRunner","namespace":"github-actions-runner-operator","name":"runner-poolsandbox","uid":"0f373e1e-2712-45ea-9a1a-d7dc974533f7","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"1775143"}, "reason": "ProcessingError", "message": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-poolsandbox\": the object has been modified; please apply your changes to the latest version and try again"}
2022-08-31T11:50:18.884Z	ERROR	util.api	unable to update status	{"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-poolsandbox\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).manageOutcome
	/workspace/controllers/githubactionrunner_controller.go:181
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).handleScaling
	/workspace/controllers/githubactionrunner_controller.go:137
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).Reconcile
	/workspace/controllers/githubactionrunner_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-08-31T11:50:18.884Z	ERROR	controller.githubactionrunner	Reconciler error	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "runner-poolsandbox", "namespace": "github-actions-runner-operator", "error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-poolsandbox\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-08-31T11:50:18.884Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:19.118Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:19.131Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:19.460Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:19.469Z	ERROR	util.api	unable to update status	{"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-poolsandbox\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).manageOutcome
	/workspace/controllers/githubactionrunner_controller.go:181
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).handleScaling
	/workspace/controllers/githubactionrunner_controller.go:122
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).Reconcile
	/workspace/controllers/githubactionrunner_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-08-31T11:50:19.469Z	ERROR	controller.githubactionrunner	Reconciler error	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "runner-poolsandbox", "namespace": "github-actions-runner-operator", "error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-poolsandbox\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-08-31T11:50:19.474Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:50:19.705Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:51:19.716Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:51:19.949Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:52:19.963Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:52:20.190Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:53:20.203Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:53:20.434Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:54:20.450Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:54:20.684Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:55:20.702Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:55:20.935Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:56:20.954Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}
2022-08-31T11:56:21.209Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-actions-runner-operator/runner-poolsandbox"}

When I have been running some tests using the master image, it seems that the behaviour is:

  • Scale pod up
  • Schedule workload on pod
  • Scale up additional pod to pick up work
  • Remove original pod once work is complete and no jobs are pending
  • Additional pod remains waiting to pick up work

This is great if we don't have many jobs waiting to be processed however sometimes we have 10s of jobs waiting to be processed and don't want to run the risk of running out of disk space on our cluster nodes. We are looking at implementing karpenter in the future to handle the scaling of cluster nodes but don't have the time right now to do so.

Is there a way right now to make the master image behave in an ephemeral way by recreating the pod and emptyDirs when the job has finished?

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions