Skip to content

Commit

Permalink
Extend health/progressing checks of gardener-resource-manager by `c…
Browse files Browse the repository at this point in the history
…ert-management` resources (gardener#9326)

* Add Cert APIs to resource-manager target scheme

* Prefactor: Add section about `health` controller

* Add `certificate` health check to GRM

* Add `certificate` progressing check to GRM

* Abandon `controller-manager-library` dependency

* Add `issuer` health check to GRM

* Add `issuer` progressing check to GRM

* Drop `metadata.generation` from integration test

The generation field is disregarded when creating
an object with a real client in integration tests.

* Address review feedback

* Address review feedback II
  • Loading branch information
timuthy authored Mar 12, 2024
1 parent 62c487e commit f905029
Show file tree
Hide file tree
Showing 20 changed files with 1,338 additions and 28 deletions.
2 changes: 2 additions & 0 deletions .golangci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ linters-settings:
alias: machinev1alpha1
- pkg: github.com/gardener/etcd-druid/api/v1alpha1
alias: druidv1alpha1
- pkg: github.com/gardener/cert-management/pkg/apis/cert/v1alpha1
alias: certv1alpha1
# Gardener extension package
- pkg: github.com/gardener/gardener/extensions/.*/(\w+)/mock$
alias: extensionsmock${1}
Expand Down
48 changes: 44 additions & 4 deletions docs/concepts/resource-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,10 +135,6 @@ The `gardener-resource-manager` can manage a resource in the following supported

The mode for a resource can be specified with the `resources.gardener.cloud/mode` annotation. The annotation should be specified in the encoded resource manifest in the Secret that is referenced by the `ManagedResource`.

#### Skipping Health Check

If a resource in the `ManagedResource` is annotated with `resources.gardener.cloud/skip-health-check=true`, then the resource will be skipped during health checks by the health controller. The `ManagedResource` conditions will not reflect the health condition of this resource anymore. The `ResourcesProgressing` condition will also be set to `False`.

#### Resource Class and Reconcilation Scope

By default, the `gardener-resource-manager` controller watches for `ManagedResource`s in all namespaces.
Expand Down Expand Up @@ -251,6 +247,50 @@ By default, cluster id is not used. If cluster id is specified, the format is `<

In addition to the origin annotation, all objects managed by the resource manager get a dedicated label `resources.gardener.cloud/managed-by`. This label can be used to describe these objects with a [selector](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/). By default it is set to "gardener", but this can be overwritten by setting the `.conrollers.managedResources.managedByLabelValue` field in the component configuration.

### [`health` Controller](../../pkg/resourcemanager/controller/health)

This controller processes `ManagedResource`s that were reconciled by the main [ManagedResource Controller](#managedResource-controller) at least once.
Its main job is to perform checks for maintaining the well [known conditions](#conditions) `ResourcesHealthy` and `ResourcesProgressing`.

#### Progressing Checks

In Kubernetes, applied changes must usually be rolled out first, e.g. when changing the base image in a `Deployment`.
Progressing checks detect ongoing roll-outs and report them in the `ResourcesProgressing` condition of the corresponding `ManagedResource`.

The following object kinds are considered for progressing checks:
- `DaemonSet`
- `Deployment`
- `StatefulSet`
- [`Prometheus`](https://github.com/prometheus-operator/prometheus-operator)
- [`Alertmanager`](https://github.com/prometheus-operator/prometheus-operator)
- [`Certificate`](https://github.com/gardener/cert-management)
- [`Issuer`](https://github.com/gardener/cert-management)

#### Health Checks

`gardener-resource-manager` can evaluate the health of specific resources, often by consulting their conditions.
Health check results are regularly updated in the `ResourcesHealthy` condition of the corresponding `ManagedResource`.

The following object kinds are considered for health checks:
- `CustomResourceDefinition`
- `DaemonSet`
- `Deployment`
- `Job`
- `Pod`
- `ReplicaSet`
- `ReplicationController`
- `Service`
- `StatefulSet`
- [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
- [`Prometheus`](https://github.com/prometheus-operator/prometheus-operator)
- [`Alertmanager`](https://github.com/prometheus-operator/prometheus-operator)
- [`Certificate`](https://github.com/gardener/cert-management)
- [`Issuer`](https://github.com/gardener/cert-management)

#### Skipping Health Check

If a resource owned by a `ManagedResource` is annotated with `resources.gardener.cloud/skip-health-check=true`, then the resource will be skipped during health checks by the `health` controller. The `ManagedResource` conditions will not reflect the health condition of this resource anymore. The `ResourcesProgressing` condition will also be set to `False`.

### [Garbage Collector For Immutable `ConfigMap`s/`Secret`s](../../pkg/resourcemanager/controller/garbagecollector)

In Kubernetes, workload resources (e.g., `Pod`s) can mount `ConfigMap`s or `Secret`s or reference them via environment variables in containers.
Expand Down
3 changes: 2 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ require (
github.com/containerd/containerd v1.7.14
github.com/coreos/go-systemd/v22 v22.5.0
github.com/fluent/fluent-operator/v2 v2.7.0
github.com/gardener/cert-management v0.12.0
github.com/gardener/dependency-watchdog v1.2.1
github.com/gardener/etcd-druid v0.22.0
github.com/gardener/hvpa-controller/api v0.5.0
Expand Down Expand Up @@ -187,7 +188,7 @@ require (
golang.org/x/exp v0.0.0-20230905200255-921286631fa9 // indirect
golang.org/x/mod v0.14.0 // indirect
golang.org/x/net v0.21.0 // indirect
golang.org/x/oauth2 v0.15.0 // indirect
golang.org/x/oauth2 v0.16.0 // indirect
golang.org/x/sync v0.6.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/term v0.18.0 // indirect
Expand Down
6 changes: 4 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,8 @@ github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMo
github.com/fsnotify/fsnotify v1.4.9/go.mod h1:znqG4EE+3YCdAaPaxE2ZRY/06pZUdp0tY4IgpuI1SZQ=
github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA=
github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM=
github.com/gardener/cert-management v0.12.0 h1:pMT15xtMFKmaKlzgBohBYOwoE29HC/S3QJKmLLU8zG4=
github.com/gardener/cert-management v0.12.0/go.mod h1:jSqNDV4H1SR/9lLS412Uhqp0+ibPe6rgRupoDDDOxeg=
github.com/gardener/dependency-watchdog v1.2.1 h1:Q0zqinZNImBuNYfNQGAXkUh5qrfJyrynO5QjUTzO/7w=
github.com/gardener/dependency-watchdog v1.2.1/go.mod h1:RgU0VmsdBHxRU8IO9VsLxEinz58xEJdEz5hxvMqLKHQ=
github.com/gardener/etcd-druid v0.22.0 h1:DVe+Zjrb93r9vI1uUiCTMHBffIUoMAKhNzFZNC6hsQ8=
Expand Down Expand Up @@ -643,8 +645,8 @@ golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4Iltr
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20191202225959-858c2ad4c8b6/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.15.0 h1:s8pnnxNVzjWyrvYdFUQq5llS1PX2zhPXmccZv99h7uQ=
golang.org/x/oauth2 v0.15.0/go.mod h1:q48ptWNTY5XWf+JNten23lcvHpLJ0ZSxF5ttTHKVCAM=
golang.org/x/oauth2 v0.16.0 h1:aDkGMBSYxElaoP81NpoUoz2oo2R2wHdZpGToUxfyQrQ=
golang.org/x/oauth2 v0.16.0/go.mod h1:hqZ+0LWXsiVoZpeld6jVt06P3adbS2Uu911W1SsJv2o=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
Expand Down
3 changes: 3 additions & 0 deletions hack/generate-crds.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ get_group_package () {
"machine.sapcloud.io")
echo "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1"
;;
"cert.gardener.cloud")
echo "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
;;
*)
>&2 echo "unknown group $1"
return 1
Expand Down
2 changes: 2 additions & 0 deletions pkg/resourcemanager/client/scheme.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
package client

import (
certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
druidv1alpha1 "github.com/gardener/etcd-druid/api/v1alpha1"
hvpav1alpha1 "github.com/gardener/hvpa-controller/api/v1alpha1"
machinev1alpha1 "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1"
Expand Down Expand Up @@ -59,6 +60,7 @@ func init() {
monitoringv1beta1.AddToScheme,
monitoringv1.AddToScheme,
vpaautoscalingv1.AddToScheme,
certv1alpha1.AddToScheme,
)
)

Expand Down
3 changes: 3 additions & 0 deletions pkg/resourcemanager/controller/health/progressing/add.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ package progressing
import (
"context"

certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
"github.com/go-logr/logr"
monitoringv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
appsv1 "k8s.io/api/apps/v1"
Expand Down Expand Up @@ -86,6 +87,8 @@ func (r *Reconciler) AddToManager(ctx context.Context, mgr manager.Manager, sour
"daemonsets": &appsv1.DaemonSet{},
"prometheuses": &monitoringv1.Prometheus{},
"alertmanagers": &monitoringv1.Alertmanager{},
"certificates": &certv1alpha1.Certificate{},
"issuers": &certv1alpha1.Issuer{},
} {
gvr := schema.GroupVersionResource{Group: appsv1.SchemeGroupVersion.Group, Version: appsv1.SchemeGroupVersion.Version, Resource: resource}

Expand Down
15 changes: 13 additions & 2 deletions pkg/resourcemanager/controller/health/progressing/reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"context"
"fmt"

certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
"github.com/go-logr/logr"
"github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring"
monitoringv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
Expand Down Expand Up @@ -105,8 +106,8 @@ func (r *Reconciler) reconcile(ctx context.Context, log logr.Logger, mr *resourc
conditionResourcesProgressing := v1beta1helper.GetOrInitConditionWithClock(r.Clock, mr.Status.Conditions, resourcesv1alpha1.ResourcesProgressing)

for _, ref := range mr.Status.Resources {
// only resources in the apps/v1 and monitoring.coreos.com/v1 API groups are considered for Progressing condition
if !sets.New(appsv1.GroupName, monitoring.GroupName).Has(ref.GroupVersionKind().Group) {
// Skip API groups that are irrelevant for progressing checks.
if !sets.New(appsv1.GroupName, monitoring.GroupName, certv1alpha1.GroupName).Has(ref.GroupVersionKind().Group) {
continue
}

Expand All @@ -122,6 +123,10 @@ func (r *Reconciler) reconcile(ctx context.Context, log logr.Logger, mr *resourc
obj = &monitoringv1.Prometheus{}
case "Alertmanager":
obj = &monitoringv1.Alertmanager{}
case "Certificate":
obj = &certv1alpha1.Certificate{}
case "Issuer":
obj = &certv1alpha1.Issuer{}
default:
continue
}
Expand Down Expand Up @@ -221,6 +226,12 @@ func (r *Reconciler) checkProgressing(ctx context.Context, obj client.Object) (b

case *monitoringv1.Alertmanager:
progressing, reason = health.IsAlertmanagerProgressing(o)

case *certv1alpha1.Certificate:
progressing, reason = health.IsCertificateProgressing(o)

case *certv1alpha1.Issuer:
progressing, reason = health.IsCertificateIssuerProgressing(o)
}

return progressing, reason, nil
Expand Down
5 changes: 5 additions & 0 deletions pkg/resourcemanager/controller/health/utils/health_checker.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ package utils
import (
"context"

certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
monitoringv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
appsv1 "k8s.io/api/apps/v1"
batchv1 "k8s.io/api/batch/v1"
Expand Down Expand Up @@ -95,6 +96,10 @@ func CheckHealth(obj client.Object) (bool, error) {
return true, health.CheckAlertmanager(o)
case *vpaautoscalingv1.VerticalPodAutoscaler:
return true, health.CheckVerticalPodAutoscaler(o)
case *certv1alpha1.Certificate:
return true, health.CheckCertificate(o)
case *certv1alpha1.Issuer:
return true, health.CheckCertificateIssuer(o)
}

return false, nil
Expand Down
53 changes: 53 additions & 0 deletions pkg/resourcemanager/controller/health/utils/health_checker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
package utils_test

import (
certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
monitoringv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
Expand Down Expand Up @@ -530,4 +531,56 @@ var _ = Describe("CheckHealth", func() {

testSuite()
})

Context("Certificate", func() {
BeforeEach(func() {
healthyReadyCondition := metav1.Condition{Type: "Ready", Status: "True"}
unhealthyReadyCondition := metav1.Condition{Type: "Ready", Status: "False"}

healthy = &certv1alpha1.Certificate{
Status: certv1alpha1.CertificateStatus{
State: "Ready",
Conditions: []metav1.Condition{healthyReadyCondition},
},
}

unhealthy = &certv1alpha1.Certificate{
Status: certv1alpha1.CertificateStatus{
Conditions: []metav1.Condition{unhealthyReadyCondition},
},
}

unhealthyWithSkipHealthCheckAnnotation = &certv1alpha1.Certificate{
ObjectMeta: metav1.ObjectMeta{
Annotations: map[string]string{
resourcesv1alpha1.SkipHealthCheck: "true",
},
},
Status: certv1alpha1.CertificateStatus{Conditions: []metav1.Condition{unhealthyReadyCondition}},
}
})

testSuite()
})

Context("Certificate Issuer", func() {
BeforeEach(func() {
healthy = &certv1alpha1.Issuer{
Status: certv1alpha1.IssuerStatus{
State: "Ready",
},
}

unhealthy = &certv1alpha1.Issuer{}
unhealthyWithSkipHealthCheckAnnotation = &certv1alpha1.Issuer{
ObjectMeta: metav1.ObjectMeta{
Annotations: map[string]string{
resourcesv1alpha1.SkipHealthCheck: "true",
},
},
}
})

testSuite()
})
})
66 changes: 66 additions & 0 deletions pkg/utils/kubernetes/health/certificate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// Copyright 2024 SAP SE or an SAP affiliate company. All rights reserved. This file is licensed under the Apache Software License, v. 2 except as noted otherwise in the LICENSE file
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package health

import (
"fmt"

certv1alpha1 "github.com/gardener/cert-management/pkg/apis/cert/v1alpha1"
corev1 "k8s.io/api/core/v1"
)

// CheckCertificate checks whether the given certificate object is healthy.
func CheckCertificate(cert *certv1alpha1.Certificate) error {
for _, condition := range cert.Status.Conditions {
if condition.Type == certv1alpha1.CertificateConditionReady {
if err := checkConditionState(condition.Type, string(corev1.ConditionTrue), string(condition.Status), condition.Reason, condition.Message); err != nil {
return err
}
break
}
}

if certState := cert.Status.State; certState != certv1alpha1.StateReady {
return fmt.Errorf("certificate state is %q (%q expected)", certState, certv1alpha1.StateReady)
}
return nil
}

// IsCertificateProgressing returns false if the Certificate's generation matches the observed generation.
func IsCertificateProgressing(cert *certv1alpha1.Certificate) (bool, string) {
if cert.Status.ObservedGeneration < cert.Generation {
return true, fmt.Sprintf("observed generation outdated (%d/%d)", cert.Status.ObservedGeneration, cert.Generation)
}

return false, "Certificate is fully rolled out"
}

// CheckCertificateIssuer checks whether the given issuer object is healthy.
func CheckCertificateIssuer(issuer *certv1alpha1.Issuer) error {
if issuerState := issuer.Status.State; issuerState != certv1alpha1.StateReady {
return fmt.Errorf("issuer state is %q (%q expected)", issuerState, certv1alpha1.StateReady)
}

return nil
}

// IsCertificateIssuerProgressing returns false if the Issuer's generation matches the observed generation.
func IsCertificateIssuerProgressing(issuer *certv1alpha1.Issuer) (bool, string) {
if issuer.Status.ObservedGeneration < issuer.Generation {
return true, fmt.Sprintf("observed generation outdated (%d/%d)", issuer.Status.ObservedGeneration, issuer.Generation)
}

return false, "Issuer is fully rolled out"
}
Loading

0 comments on commit f905029

Please sign in to comment.