-
Notifications
You must be signed in to change notification settings - Fork 45
Design: Separate reboot coordination and version operator #98
Description
Proposal
Simplify the update-operator so that it just performs update coordination (not deploying the agent). Write a separate version-operator which ensures the desired version of the coordinator and agent are running (creating them if not, managing migrations if any in future).
In the desired architecture, there would be three single-purpose components (names are examples):
container-linux-update-coordinator- deployment which watches node annotations and coordinates reboots as needed to ensure not too many are rebooting at once. (doesn't deploy agent)container-linux-update-agent- daemonset which listens for D-Bus signals fromupdate-engineand indicates via annotations that it needs a reboot.container-linux-version-operator- (optional) ensures the correct version of theupdate-coordinatorandupdate-agentare running via a reconciliation loop. Handles migrations, if needed.
A user may choose to deploy just the update-coordinator and update-agent. Or he/she could deploy the container-linux-version-operator to manage those apps on their behalf and contain reconciliation and migration logic.
Problem
In the current design, the update-operator is doing both update coordination and deployment of the agent (conditionally depending on flags) at a hardcoded version. This is problematic
- Users ask for the ability to customize the update-agent daemonset through the update-operator. This expands the scope of the operator greatly and makes it take configuration for the agent it deploys and for coordinating reboots. There is no plan for how migrations will work.
update-operatoris not currently performing a reconciliation loop to ensure theupdate-agentis running. Its just performing a one-time check on startup.update-operatoris making assumptions about compatibility between itself andupdate-agentreleases. That complexity should live in a component designed for it.- Oddity or code smell is apparent from the
-manage-agent=true/falseoption. https://github.com/coreos/container-linux-update-operator/blob/master/cmd/update-operator/main.go#L19 - Conceivably we only need to update one component, not both. Right now we bump both.
- Differs from practices established through experience in other CoreOS and Tectonic operators.
Discussed with @aaronlevy and @euank