Description
(We're technically running 0.10.0. We'll try to upgrade soon...)
Our GitHub runners got stuck, we had a limit to our pool, so we edited each pod to remove the finalizer and then deleted the pod. This may or may not have allowed some to start.
Eventually the operator wouldn't start any more for the pool apparently because the system had reached a limit. (Not quite sure why, we can file a bug about that later).
To try to fix things, someone deleted the GithubActionRunner
object hoping that this would unstick the operator (ArgoCD manages the object, so it resurrected it immediately after deletion).
Instead, we get:
2023-02-03T16:52:49.900Z INFO controllers.GithubActionRunner Pods and runner API not in sync, returning early {"githubactionrunner": "github/docker-runner-pool"}
My guess is that the only way to "fix" this is to restart the operator, but, ideally the operator should be more tolerant of this case.
Rough thoughts:
- It'd be nice if the operator would be able to recognize "oh, the object i'm talking to is not the one I was monitoring and is younger than the one I was monitoring" + "I guess I should look at my old state, and probably discard it and then track state against the new object".
- To some extent, it might be nice if there was a way to ask the operator to do things like "drop" things or refresh or ...