Skip to content
This repository was archived by the owner on Sep 18, 2020. It is now read-only.
This repository was archived by the owner on Sep 18, 2020. It is now read-only.

Design: Separate reboot coordination and version operator #98

@dghubble

Description

@dghubble

Proposal

Simplify the update-operator so that it just performs update coordination (not deploying the agent). Write a separate version-operator which ensures the desired version of the coordinator and agent are running (creating them if not, managing migrations if any in future).

In the desired architecture, there would be three single-purpose components (names are examples):

  • container-linux-update-coordinator - deployment which watches node annotations and coordinates reboots as needed to ensure not too many are rebooting at once. (doesn't deploy agent)
  • container-linux-update-agent - daemonset which listens for D-Bus signals from update-engine and indicates via annotations that it needs a reboot.
  • container-linux-version-operator - (optional) ensures the correct version of the update-coordinator and update-agent are running via a reconciliation loop. Handles migrations, if needed.

A user may choose to deploy just the update-coordinator and update-agent. Or he/she could deploy the container-linux-version-operator to manage those apps on their behalf and contain reconciliation and migration logic.

Problem

In the current design, the update-operator is doing both update coordination and deployment of the agent (conditionally depending on flags) at a hardcoded version. This is problematic

  • Users ask for the ability to customize the update-agent daemonset through the update-operator. This expands the scope of the operator greatly and makes it take configuration for the agent it deploys and for coordinating reboots. There is no plan for how migrations will work.
  • update-operator is not currently performing a reconciliation loop to ensure the update-agent is running. Its just performing a one-time check on startup.
  • update-operator is making assumptions about compatibility between itself and update-agent releases. That complexity should live in a component designed for it.
  • Oddity or code smell is apparent from the -manage-agent=true/false option. https://github.com/coreos/container-linux-update-operator/blob/master/cmd/update-operator/main.go#L19
  • Conceivably we only need to update one component, not both. Right now we bump both.
  • Differs from practices established through experience in other CoreOS and Tectonic operators.

Discussed with @aaronlevy and @euank

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions