Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate the Worker State Reconciler #8559

Merged
merged 14 commits into from
Oct 5, 2023

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area scalability cost
/kind enhancement

What this PR does / why we need it:
This PR drops the Worker state reconciler that currently is auto-added to all Worker controllers. Please see GEP-22 for information about the motivation.

With this PR, the migration flow is adapted as follows:

  • gardenlet collects machine state and persists it in the ShootState under .spec.gardener[].type=machine-state.
  • gardenlet shallow-deletes the machine-related resources (this was previously done by the generic Worker actuator).
  • For a few releases, gardenlet no longer persists the state stored in .status.state of Worker resources (to not duplicate the machine state in the ShootState).

The restoration flow is adapted as follows:

  • The generic Worker actuator tries to read the machine state directly from the ShootState in the garden cluster (which is now possible since Garden Cluster Access For Extensions In Seed Clusters #8001). This alleviates the necessity to let gardenlet read it and populate it somehow to the destination seed for the provider extensions to pick up - they can do it directly.
  • gardenlet fetches the machine state from the ShootState's .spec.gardener[] field (previously, the machine state was fetched from .spec.extensions[].kind=Worker) and still populates it to the Worker's .status.state field. This is just to preserve backwards-compatibility, because registered extensions might not yet have vendored the newest library, hence they might still rely on this field during restoration. This will be dropped in a future version.

The PR also addresses a TODO leftover task from #8082 about only persisting the ShootState after all extension resources have been migrated.

Which issue(s) this PR fixes:
Fixes #8488

Special notes for your reviewer:

  • The version skew policy for extensions was clarified as part of this PR.
  • github.com/gardener/machine-controller-manager was revendored to 0.50.0 according to [ci:component:github.com/gardener/machine-controller-manager:v0.49.3->v0.50.0] #8539. In this version, the legacy provider-specific API types are cleaned up. The revendoring was needed to make the machine.sapcloud.io/v1alpha1 CRDs available for the integration tests.
  • Accordingly, the no longer needed MachineClass-related methods were dropped from the generic Worker actuator interface.

/cc @plkokanov

Release note:

The `Worker` state reconciler has been dropped, i.e., updated provider extensions will no longer populate the machine state to the `.status.state` field of `Worker` resources. For a few releases, `gardenlet` will no longer persist any still existing data in the `.status.state` field of `Worker` resources during a control plane migration of a `Shoot`, and it will set `.status.state` to `nil` after a successful reconciliation or restore operation.
Provider extensions must now pass the `cluster.Cluster` object for the garden cluster to the `genericactuator.NewActuator` function. See [this](https://github.com/gardener/gardener/blob/8d2f116aa606e5181cd430e5063dd798629bdc78/cmd/gardener-extension-provider-local/app/app.go#L228-L246) for an example how to create such a `cluster.Cluster` object.
The `MachineClassKind()`, `MachineClass()`, and `MachineClassList()` methods have been dropped from the generic `Worker` actuator's interface and do not need to be implemented anymore.
During the `Migrate` phase of a control plane migration of a `Shoot`, the state is now only persisted after all extension resources have been migrated. Consequently, make sure that you have added all state to the `.status.state` field of the respective extension object when running `Migrate()`.

@gardener-prow gardener-prow bot added area/scalability Scalability related area/cost Cost related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 26, 2023
@plkokanov
Copy link
Contributor

/assign

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2023
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2023
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2023
@timuthy
Copy link
Member

timuthy commented Sep 28, 2023

/assign

@gardener-prow gardener-prow bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 29, 2023
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2023
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR and as usual very understandable 👏 😄
Some minor points, rest looks good to me.

pkg/utils/gardener/machines.go Outdated Show resolved Hide resolved
pkg/utils/gardener/machines.go Outdated Show resolved Hide resolved
pkg/utils/gardener/shootstate/machines.go Outdated Show resolved Hide resolved
pkg/utils/gardener/shootstate/shootstate.go Outdated Show resolved Hide resolved
pkg/component/extensions/worker/worker.go Show resolved Hide resolved
docs/extensions/migration.md Outdated Show resolved Hide resolved
@rfranzke rfranzke requested a review from timuthy October 1, 2023 14:05
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2023
Now `gardenlet` persists the machine state as part of `shootstate.Deploy`. This function is executed after all extension resources were migrated.
Will be needed in `botanist/migration.go` in a subsequent commit.
Now that the `gardenlet` persists the machine state explicitly, we do not need to duplicate it via the `Worker` state.
- For backwards-compatibility, we have to keep this flow since the generic `Worker` actuator's `Restore` function expects to find the state in the `Worker`'s `.status.state` field: https://github.com/gardener/gardener/blob/422e2bbedd23351383154bb733838a416f39f2b6/extensions/pkg/controller/worker/genericactuator/actuator_restore.go#L121C1-L141
- This is somewhat dirty for now, but probably acceptable given that this was the flow for the past years.
- A subsequent commit will adapt the generic `Worker` actuator to fetch the state from elsewhere, however we have to wait until all provider extensions have been re-vendored with the new logic before we change this here.
…luster

This is to prevent `gardenlet` from duplicating the machine state into the destination seed cluster.
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2023
@gardener-prow gardener-prow bot requested a review from timuthy October 4, 2023 09:08
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2023
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Oct 4, 2023

LGTM label has been added.

Git tree hash: 9a306655be5d3e65493773d9f178233a223eb630

@plkokanov
Copy link
Contributor

/lgtm

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@gardener-prow
Copy link
Contributor

gardener-prow bot commented Oct 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timuthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Oct 5, 2023

@rfranzke: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gardener-apidiff 5388995 link false /test pull-gardener-apidiff

Full PR test history. Your PR dashboard. Command help for this repository.
Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shafeeqes
Copy link
Contributor

/test pull-gardener-unit

@gardener-prow gardener-prow bot merged commit 25ecc27 into gardener:master Oct 5, 2023
15 of 16 checks passed
@rfranzke rfranzke deleted the worker-state branch October 5, 2023 09:25
rfranzke added a commit to rfranzke/gardener that referenced this pull request Nov 30, 2023
follow-up of gardener#8559, released with `v1.82.0`
gardener-prow bot pushed a commit that referenced this pull request Dec 5, 2023
* Remove MCM legacy CRD deletion

follow-up of #8559, released with `v1.82.0`

* Remove legacy `shoot-node-logging` MR cleanup

follow-up of #8501, released with `v1.80.0`

* Remove MCM legacy resources cleanup in generic `Worker` actuator

follow-up of #8596, released with `v1.82.0`

* Restrict GRM's token requestor to secrets with `class=shoot`

follow-up of #8152, released with `v1.74.0`

* Remove support for deprecated `NetworkPolicy` annotations

follow-up of #7907, released with `v1.71.0`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cost Cost related area/scalability Scalability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GEP-22] Eliminate the Worker State Reconciler
4 participants