Skip to content

Commit

Permalink
Eliminate the Worker State Reconciler (gardener#8559)
Browse files Browse the repository at this point in the history
* Generate `machine.sapcloud.io` CRDs based on vendored MCM version

The CRDs will be needed in a subsequent commit when adapting the
integration test for the `shoot-state` reconciler (we need to deploy the
CRDs in the testenv then so that the reconciler can fetch the machine
objects).

* Drop `MachineClass{Kind,List}` methods from `WorkerDelegate` interface

Since a long time, all MCM providers have been migrated to no longer use their specific machine class types (e.g., `AWSMachineClass`) but the generic `MachineClass` type only.
Hence, these interface methods are no longer needed.

* Clarify extension library version skew

* `ShootState` is only persisted after extension resource were migrated

Follow-up of gardener@7cd88ad

* Move `extensions/pkg/controller/worker/helper.BuildOwnerToMachine{Set}sMap` functions to `pkg/utils/gardener` package

They will be reused from other (to-be-introduced) functions in the `pkg/utils/gardener` packages, and we don't want to import the extensions library from there. Hence, let's better move them.

* Move machine state computation logic from state reconciler to `pkg/utils/gardener/shootstate` package

* Drop worker state reconciler

Now `gardenlet` persists the machine state as part of `shootstate.Deploy`. This function is executed after all extension resources were migrated.

* Move `machineclass` purpose constant to `constants` package

Will be needed in `botanist/migration.go` in a subsequent commit.

* Move machine migration logic from generic `Worker` actuator to botanist

* Drop `Worker` state when persisting `ShootState`

Now that the `gardenlet` persists the machine state explicitly, we do not need to duplicate it via the `Worker` state.

* `Worker` restoration uses machine state stored in `.spec.gardener[]`

- For backwards-compatibility, we have to keep this flow since the generic `Worker` actuator's `Restore` function expects to find the state in the `Worker`'s `.status.state` field: https://github.com/gardener/gardener/blob/422e2bbedd23351383154bb733838a416f39f2b6/extensions/pkg/controller/worker/genericactuator/actuator_restore.go#L121C1-L141
- This is somewhat dirty for now, but probably acceptable given that this was the flow for the past years.
- A subsequent commit will adapt the generic `Worker` actuator to fetch the state from elsewhere, however we have to wait until all provider extensions have been re-vendored with the new logic before we change this here.

* Extensions fetch machine state directly from `ShootState` in garden cluster

This is to prevent `gardenlet` from duplicating the machine state into the destination seed cluster.

* Clean `Worker`'s `.status.state` field after successful reconcile/restore

* Address PR review feedback
  • Loading branch information
rfranzke authored Oct 5, 2023
1 parent a7b677c commit 25ecc27
Show file tree
Hide file tree
Showing 81 changed files with 2,670 additions and 8,408 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,13 @@ rules:
- filters.fluentbit.fluent.io
- outputs.fluentbit.fluent.io
- parsers.fluentbit.fluent.io
# TODO(rfranzke): Remove this code after Gardener v1.83 has been released.
- alicloudmachineclasses.machine.sapcloud.io
- awsmachineclasses.machine.sapcloud.io
- azuremachineclasses.machine.sapcloud.io
- gcpmachineclasses.machine.sapcloud.io
- openstackmachineclasses.machine.sapcloud.io
- packetmachineclasses.machine.sapcloud.io
verbs:
- delete
- apiGroups:
Expand Down Expand Up @@ -437,10 +444,10 @@ rules:
- apiGroups:
- machine.sapcloud.io
resources:
- machineclasses
- machinedeployments
- machinesets
- machines
- machineclasses
verbs:
- list
- watch
Expand Down
1 change: 1 addition & 0 deletions cmd/gardener-extension-provider-local/app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ func NewControllerManagerCommand(ctx context.Context) *cobra.Command {
ingressCtrlOpts.Completed().Apply(&localingress.DefaultAddOptions)
serviceCtrlOpts.Completed().Apply(&localservice.DefaultAddOptions)
workerCtrlOpts.Completed().Apply(&localworker.DefaultAddOptions.Controller)
localworker.DefaultAddOptions.GardenCluster = gardenCluster
localBackupBucketOptions.Completed().Apply(&localbackupbucket.DefaultAddOptions)
localBackupBucketOptions.Completed().Apply(&localbackupentry.DefaultAddOptions)
heartbeatCtrlOptions.Completed().Apply(&heartbeat.DefaultAddOptions)
Expand Down
26 changes: 23 additions & 3 deletions docs/deployment/version_skew_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ In multi-instance setups of Gardener, the newest and oldest `gardener-apiserver`
Example:

- newest `gardener-apiserver` is at **1.37**
- other `gardener-apiserver` instances are supported at **1.37** and **v1.36**
- other `gardener-apiserver` instances are supported at **1.37** and **1.36**

#### gardener-controller-manager, gardener-scheduler, gardener-admission-controller, gardenlet

Expand All @@ -37,8 +37,8 @@ They are expected to match the `gardener-apiserver` minor version, but may be up

Example:

- `gardener-apiserver` is at **v1.37**
- `gardener-controller-manager`, `gardener-scheduler`, `gardener-admission-controller`, and `gardenlet` are supported at **1.37** and **v1.36**
- `gardener-apiserver` is at **1.37**
- `gardener-controller-manager`, `gardener-scheduler`, `gardener-admission-controller`, and `gardenlet` are supported at **1.37** and **1.36**

#### gardener-operator

Expand Down Expand Up @@ -87,6 +87,26 @@ Actions:

- Upgrade `gardener-operator` to **1.38**.

## Supported Gardener Extension Versions

Extensions are maintained and released separately and independently of the `gardener/gardener` repository.
Consequently, providing version constraints is not possible in this document.
Sometimes, the documentation of extensions contains compatibility information (e.g., "this extension version is only compatible with Gardener versions higher than **1.80**", see [this example](https://github.com/gardener/gardener-extension-provider-aws#compatibility)).

However, since all extensions typically make use of the [extensions library](../../extensions) ([example](https://github.com/gardener/gardener-extension-provider-aws/blob/cb96b60c970c2e20615dffb3018dc0571cab764d/go.mod#L12)), a general constraint is that _no extension must depend on a version of the extensions library higher than the version of `gardenlet`_.

Example 1:

- `gardener-apiserver` and other Gardener control plane components are at **1.37**.
- All `gardenlet`s are at **1.37**.
- Only extensions are supported which depend on **1.37** or lower of the extensions library.

Example 2:

- `gardener-apiserver` and other Gardener control plane components are at **1.37**.
- Some `gardenlet`s are at **1.37**, others are at **1.36**.
- Only extensions are supported which depend on **1.36** or lower of the extensions library.

## Supported Kubernetes Versions

Please refer to [Supported Kubernetes Versions](../usage/supported_k8s_versions.md).
28 changes: 21 additions & 7 deletions docs/extensions/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,18 +73,32 @@ In addition, extension controllers that use [referenced resources](referenced-re

### Migrate and Restore Actuator Methods

Most extension controller implementations follow a common pattern where a generic `Reconciler` implementation delegates to an `Actuator` interface that contains the methods `Reconcile` and `Delete`, provided by the extension. The two new methods `Migrate` and `Restore` have been added to all such `Actuator` interfaces, see [the infrastructure `Actuator` interface](https://github.com/gardener/gardener/blob/master/extensions/pkg/controller/infrastructure/actuator.go) as an example. These methods are called by the generic reconcilers for the [migrate and restore operations](#migrate-and-restore-operations) respectively, and should be implemented by the extension according to the above guidelines.
Most extension controller implementations follow a common pattern where a generic `Reconciler` implementation delegates to an `Actuator` interface that contains the methods `Reconcile` and `Delete`, provided by the extension.
Two methods `Migrate` and `Restore` are available in all such `Actuator` interfaces, see [the infrastructure `Actuator` interface](https://github.com/gardener/gardener/blob/master/extensions/pkg/controller/infrastructure/actuator.go) as an example.
These methods are called by the generic reconcilers for the [migrate and restore operations](#migrate-and-restore-operations) respectively, and should be implemented by the extension according to the above guidelines.

### Owner Checks
### Extension Controllers Based on Generic Actuators

The so called "bad case" scenario for control plane migration proposed in [GEP-17](../proposals/17-shoot-control-plane-migration-bad-case.md) introduced the requirement for extension controllers to check whether they are currently operating in the source or destination seed during reconciliations to avoid the case in which controllers from different seeds can operate on the same IaaS resources (split brain scenario). To that end, a special "owner checking" mechanism has been added to the `Reconciler` implementations of all extension controllers. For an example usage of this mechanism see [the infrastructure Reconciler implementation](https://github.com/gardener/gardener/blob/7ac4b04feec409f3e5a5208cd06af9a10c755337/extensions/pkg/controller/infrastructure/reconciler.go#L109-L121). The purpose of the owner check is to interrupt reconciliations of extension controllers that do not operate in the seed that is currently configured to host the shoot's control plane. Note that `Migrate` operations must not be interrupted, as they are required to clean up Kubernetes resources left in the shoot's control plane namespace and do not act on IaaS resources.
In practice, the implementation of many extension controllers (for example, the `ControlPlane` and `Worker` controllers in most provider extensions) are based on a *generic `Actuator` implementation* that only delegates to extension methods for behavior that is truly provider specific.
In all such cases, the `Migrate` and `Restore` methods have already been implemented properly in the generic actuators and there is nothing more to do in the extension itself.

### Extension Controllers Based on Generic Actuators
In some rare cases, extension controllers based on a generic actuator might still introduce a custom `Actuator` implementation to override some of the generic actuator methods in order to enhance or change their behavior in a certain way.
In such cases, the `Migrate` and `Restore` methods might need to be overridden as well, see the [Azure controlplane controller](https://github.com/gardener/gardener-extension-provider-azure/tree/master/pkg/controller/controlplane) as an example.

In practice, the implementation of many extension controllers (for example, the controlplane and worker controllers in most provider extensions) are based on a *generic `Actuator` implementation* that only delegates to extension methods for behavior that is truly provider specific. In all such cases, the `Migrate` and `Restore` methods have already been implemented properly in the generic actuators and there is nothing more to do in the extension itself.
#### `Worker` State

In some rare cases, extension controllers based on a generic actuator might still introduce a custom `Actuator` implementation to override some of the generic actuator methods in order to enhance or change their behavior in a certain way. In such cases, the `Migrate` and `Restore` methods might need to be overridden as well, see the [Azure controlplane controller](https://github.com/gardener/gardener-extension-provider-azure/tree/master/pkg/controller/controlplane) as an example.
Note that the machine state is handled specially by `gardenlet` (i.e., all relevant objects in the `machine.sapcloud.io/v1alpha1` API are directly persisted by `gardenlet` and **NOT** by the generic actuators).
In the past, they were persisted to the `Worker`'s `.status.state` field by the so-called "worker state reconciler", however, this reconciler was dropped and changed as part of [GEP-22](../proposals/22-improved-usage-of-shootstate-api.md#eliminating-the-worker-state-reconciler).
Nowadays, `gardenlet` directly writes the state to the `ShootState` resource during the `Migrate` phase of a `Shoot` (without the detour of the `Worker`'s `.status.state` field).
On restoration, unlike for other extension kinds, `gardenlet` no longer populates the machine state into the `Worker`'s `.status.state` field.
Instead, the extension controller should read the machine state directly from the `ShootState` in the garden cluster (see [this document](garden-api-access.md) for information how to access the garden cluster) and use it to subsequently restore the relevant `machine.sapcloud.io/v1alpha1` resources.
This flow is implemented in the [generic `Worker` actuator](../../extensions/pkg/controller/worker/genericactuator/actuator_restore.go).
As a result, Extension controllers using this generic actuator do not need to implement any custom logic.

### Extension Controllers Not Based on Generic Actuators

The implementation of some extension controllers (for example, the infrastructure controllers in all provider extensions) are not based on a generic `Actuator` implementation. Such extension controllers must always provide a proper implementation of the `Migrate` and `Restore` methods according to the above guidelines, see the [AWS infrastructure controller](https://github.com/gardener/gardener-extension-provider-aws/tree/master/pkg/controller/infrastructure) as an example. In practice, this might result in code duplication between the different extensions, since the `Migrate` and `Restore` code is usually not provider or OS-specific.
The implementation of some extension controllers (for example, the infrastructure controllers in all provider extensions) are not based on a generic `Actuator` implementation.
Such extension controllers must always provide a proper implementation of the `Migrate` and `Restore` methods according to the above guidelines, see the [AWS infrastructure controller](https://github.com/gardener/gardener-extension-provider-aws/tree/master/pkg/controller/infrastructure) as an example.
In practice, this might result in code duplication between the different extensions, since the `Migrate` and `Restore` code is usually not provider or OS-specific.

> If you do not use the generic `Worker` actuator, see [this section](#worker-state) for information how to handle the machine state related to the `Worker` resource.
109 changes: 109 additions & 0 deletions example/seed-crds/10-crd-machine.sapcloud.io_machineclasses.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.13.0
name: machineclasses.machine.sapcloud.io
spec:
group: machine.sapcloud.io
names:
kind: MachineClass
listKind: MachineClassList
plural: machineclasses
shortNames:
- mcc
singular: machineclass
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: MachineClass can be used to templatize and re-use provider configuration
across multiple Machines / MachineSets / MachineDeployments.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
credentialsSecretRef:
description: CredentialsSecretRef can optionally store the credentials
(in this case the SecretRef does not need to store them). This might
be useful if multiple machine classes with the same credentials but
different user-datas are used.
properties:
name:
description: name is unique within a namespace to reference a secret
resource.
type: string
namespace:
description: namespace defines the space within which the secret name
must be unique.
type: string
type: object
x-kubernetes-map-type: atomic
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
nodeTemplate:
description: NodeTemplate contains subfields to track all node resources
and other node info required to scale nodegroup from zero
properties:
capacity:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: Capacity contains subfields to track all node resources
required to scale nodegroup from zero
type: object
instanceType:
description: Instance type of the node belonging to nodeGroup
type: string
region:
description: Region of the expected node belonging to nodeGroup
type: string
zone:
description: Zone of the expected node belonging to nodeGroup
type: string
required:
- capacity
- instanceType
- region
- zone
type: object
x-kubernetes-preserve-unknown-fields: true
provider:
description: Provider is the combination of name and location of cloud-specific
drivers.
type: string
providerSpec:
description: Provider-specific configuration to use during node creation.
type: object
x-kubernetes-preserve-unknown-fields: true
secretRef:
description: SecretRef stores the necessary secrets such as credentials
or userdata.
properties:
name:
description: name is unique within a namespace to reference a secret
resource.
type: string
namespace:
description: namespace defines the space within which the secret name
must be unique.
type: string
type: object
x-kubernetes-map-type: atomic
required:
- providerSpec
type: object
served: true
storage: true
Loading

0 comments on commit 25ecc27

Please sign in to comment.