Design: Separate reboot coordination and version operator

### Proposal

Simplify the `update-operator` so that it just performs update coordination (not deploying the agent). Write a separate `version-operator` which ensures the desired version of the coordinator and agent are running (creating them if not, managing migrations if any in future).

In the desired architecture, there would be three single-purpose components (names are examples):

* `container-linux-update-coordinator` - deployment which watches node annotations and coordinates reboots as needed to ensure not too many are rebooting at once. (doesn't deploy agent)
* `container-linux-update-agent` - daemonset which listens for D-Bus signals from `update-engine` and indicates via annotations that it needs a reboot.
* `container-linux-version-operator` - (optional) ensures the correct version of the `update-coordinator` and `update-agent` are running via a reconciliation loop. Handles migrations, if needed.

A user may choose to deploy just the `update-coordinator` and `update-agent`. Or he/she could deploy the `container-linux-version-operator` to manage those apps on their behalf and contain reconciliation and migration logic. 

### Problem

In the current design, the `update-operator` is doing both update coordination and deployment of the agent (conditionally depending on flags) at a hardcoded version. This is problematic

* Users ask for the ability to customize the update-agent daemonset through the update-operator. This expands the scope of the operator greatly and makes it take configuration for the agent it deploys and for coordinating reboots. There is no plan for how migrations will work.
* `update-operator` is not currently performing a reconciliation loop to ensure the `update-agent` is running. Its just performing a one-time check on startup.
* `update-operator` is making assumptions about compatibility between itself and `update-agent` releases. That complexity should live in a component designed for it.
* Oddity or code smell is apparent from the `-manage-agent=true/false` option. https://github.com/coreos/container-linux-update-operator/blob/master/cmd/update-operator/main.go#L19
* Conceivably we only need to update one component, not both. Right now we bump both.
* Differs from practices established through experience in other CoreOS and Tectonic operators.

Discussed with @aaronlevy and @euank 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design: Separate reboot coordination and version operator #98

Proposal

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design: Separate reboot coordination and version operator #98

Description

Proposal

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions