Add support for container restart policy#1258
Conversation
Containers move to .bootstrapped state after assigning SandboxClient. Following invariant holds: 1. If .stopped, state.client = nil 2. If .bootstrapped, state.client != nil. It'd better to tightly couple the state and client.
Then, add `restarting` state to ContainerStatus
ContainersService spawn an unstructured Task for restart scheduler, which receives stopped container ids through a async queue and restart them if matching conditions. While it doesn't run under lock, it doesn't introduce a new level of race, as all the actual operations changing container state (e.g., `bootstrap`) are lock protected.
This reuses ExitMonitor, which was not initially for this. It might be better to come up with generic Monitor structure?
|
@JaewonHur Busy week and I hope to get to review this here soon. I saw your comment and I am all for following the direction we take in this PR. Though it might be worthwhile to get started on the discussion on how we want to handle test cases and maybe work together on that piece? |
doStop cannot guarantee the container is stopped in short term.
|
Hi @saehejkang, sorry for moving forward on my own. I just did minimal implementation today, but I'll definitely put our name together here. Further things to do might be:
|
| } | ||
|
|
||
| public static let `default` = ContainerCreateOptions(autoRemove: false) | ||
| public init(from decoder: Decoder) throws { |
There was a problem hiding this comment.
Why was the decoder not being used before when it was just autoRemove?
There was a problem hiding this comment.
I actually have no idea :)
Maybe, wanted to keep the code as simple as possible at that time?
| case stopped | ||
| /// The object is waiting to be restarted. | ||
| case restarting | ||
| /// The object is currently bootstrapped. |
There was a problem hiding this comment.
What does the bootstrapped status mean? The container or sandbox is stopped but ready to be started?
There was a problem hiding this comment.
bootstrapped means (by itself) the container is bootstrapped (i.e., bootstrap function has been successfully executed on that container).
Before this update, the state was same stopped whether it is bootstrapped or not. The only difference was SandboxClient is allocated or not---I personally felt that somewhat weird, but left there.
But after adding restarting state, it becomes more weird, since container state should go restarting (waiting for backOff), then stopped (after bootstrap) then running`.
BTW, now we split the RuntimeStatus to ContainerStatus and SandboxStatus, so it only applies to containers.
There was a problem hiding this comment.
While writing this comment, I found that I missed setting the container restarting in scheduler lol.
| return results | ||
| } | ||
|
|
||
| public func runRestartScheduler() throws { |
There was a problem hiding this comment.
Never thought I would see a scheduler again after OS in college 😆.
Solid foundation work was done with the restart policy and I am sure it could follow the same pattern?
What would be the use case for this? Could it be used to automatically stop trying to restart a container after a certain count?
Hopefully we can use something like #1176 to be a sort of final e2e test of all the policies. |
Thanks, if you make PR to my branch, let me merge it :)
Yes. Docker supports that functionality (https://docs.docker.com/engine/containers/start-containers-automatically/) |
This PR initiates the implementation for adding restart feature (#286), which can be extended to support auto-start later (#158, #1201). The first milestone is to implement restart policies
no,on-failure, andalways(which behaves similar to docker-restart-policy).To support restart policy, following persistent and runtime state variables should be added:
ContainerCreateOption(persistent)RestartPolicy: remember restart policy across the container life cycle.ContainerState(run-time)StoppedState(startError: Error?, exitStatus: ExitStatus?): determine restart logic when container exitsmanualStopped: Bool: remember if the container is stopped by userTrueon first container creation,FalseonloadAtBootbackOff: Duration?: remember backoff timer.nilnilonstartErrorormanualStoppedA new state should be added in container life cycle:
restarting: restarted containers first go to this state and waits forbackOffto start.Implementation steps are described below:
restartingstate, where the restarting containers first visit, and proceeds tobootstrap(afterbackOff) thenstartProcess.handleContainerExitrunning---i.e., resetting thebackOff.stabilityMonitor(similar toExitMonitor) that registers callback on successfulstartProcess.Type of Change
Motivation and Context
Users can setup auto-restarting containers using this feature.
Testing