Skip to content

Commit

Permalink
[GEP-19] Migrate aggregate Prometheus deployment and configuration (g…
Browse files Browse the repository at this point in the history
…ardener#9200)

* Integrate aggregate prometheus deployment into Seed controller

`Ingress`, rules and configs are still missing. Will follow in separate commits.

* Update Plutono config for targeting aggregate Prometheus

* Incorporate health of `prometheus-aggregate` MR in seed care controller

* `Ingress`

* External labels

* Alerting configuration

* Translate `vali` scrape config and rules

* Translate `fluent-bit` scrape config and rules

* Translate remaining Prometheus rules

The `metering.rules.stateful.yaml` file is exactly the same as used in the cache Prometheus. There is a bash script generating it (https://github.com/gardener/gardener/blob/master/pkg/component/monitoring/prometheus/cache/assets/prometheusrules/metering.rules.stateful.sh). Let's simply copy it from there and reuse it.

* Translate `shoot-prometheus` scrape config

* Translate cache `prometheus` scrape config

We use a `ScrapeConfig` resource here because we explicitly want to have the `role=service` in the `kubernetes_sd_configs`

* Translate `istio` scrape configs

* Delete no longer needed code

* Harmonize prometheus component instantiation

* Move vali constants into new `vali/constants` package

We already follow this approach for optimizing the transitive imports for other components, see
- pkg/component/coredns/constants
- pkg/component/etcd/constants
- pkg/component/kubeapiserver/constants
- pkg/component/nodelocaldns/constants
- pkg/component/resourcemanager/constants
- pkg/component/vpa/constants

Without this, we would introduce some undesired transitive package imports, requiring to update the skaffold config:

```
>> Checking defined dependencies in Skaffold config 'provider-local' for 'gardener-extension-provider-local' in 'skaffold.yaml'...
>>> The following actual dependencies are missing (need to be added):
pkg/component/etcd
pkg/component/etcd/constants
pkg/component/monitoring
pkg/component/monitoring/alertmanager
pkg/component/monitoring/prometheus
pkg/component/monitoring/prometheus/cache

>>> The following dependencies are not needed actually (need to be removed):

>>> Run './hack/update-skaffold-deps.sh' to fix.

>> Checking defined dependencies in Skaffold config 'gardenlet' for 'gardener-node-agent' in 'skaffold.yaml'...
>>> The following actual dependencies are missing (need to be added):
pkg/component/etcd
pkg/component/etcd/constants
pkg/component/monitoring
pkg/component/monitoring/alertmanager
pkg/component/monitoring/prometheus
pkg/component/monitoring/prometheus/cache
pkg/extensions
pkg/utils/kubernetes/unstructured

>> Checking defined dependencies in Skaffold config 'gardener-operator' for 'gardener-operator' in 'skaffold-operator.yaml'...
>>> The following actual dependencies are missing (need to be added):
pkg/component/monitoring
pkg/component/monitoring/alertmanager
pkg/component/monitoring/prometheus

>>> The following dependencies are not needed actually (need to be removed):

>>> Run './hack/update-skaffold-deps.sh' to fix.
```

This approach actually allows us to drop some package dependencies.

* Address PR review feedback
  • Loading branch information
rfranzke authored Feb 28, 2024
1 parent 2e1feec commit 2416f88
Show file tree
Hide file tree
Showing 73 changed files with 2,201 additions and 1,841 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ rules:
- vali-vali-0
- prometheus-db-prometheus-0 # TODO(rfranzke): Remove this as soon as the Prometheus migration code is getting deleted.
- prometheus-db-seed-prometheus-0 # TODO(rfranzke): Remove this as soon as the Prometheus migration code is getting deleted.
- prometheus-db-aggregate-prometheus-0 # TODO(rfranzke): Remove this as soon as the Prometheus migration code is getting deleted.
verbs:
- delete
- apiGroups:
Expand Down
16 changes: 8 additions & 8 deletions docs/development/priority-classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ When using the `gardener-operator` for managing the garden runtime and virtual c

### `PriorityClass`es for Seed System Components

| Name | Priority | Associated Components (Examples) |
|------------------------------------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `gardener-system-critical` | 999998950 | `gardenlet`, `gardener-resource-manager`, `istio-ingressgateway`, `istiod` |
| `gardener-system-900` | 999998900 | Extensions, `reversed-vpn-auth-server` |
| `gardener-system-800` | 999998800 | `dependency-watchdog-endpoint`, `dependency-watchdog-probe`, `etcd-druid`, `(auditlog-)mutator`, `vpa-admission-controller` |
| `gardener-system-700` | 999998700 | `auditlog-seed-controller`, `hvpa-controller`, `vpa-recommender`, `vpa-updater` |
| `gardener-system-600` | 999998600 | `aggregate-prometheus`, `alertmanager-seed`, `fluent-operator`, `fluent-bit`, `plutono`, `kube-state-metrics`, `nginx-ingress-controller`, `nginx-k8s-backend`, `prometheus-operator`, `prometheus-cache`, `vali`, `seed-prometheus` |
| `gardener-reserve-excess-capacity` | -5 | `reserve-excess-capacity` ([ref](https://github.com/gardener/gardener/pull/6135)) |
| Name | Priority | Associated Components (Examples) |
|------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `gardener-system-critical` | 999998950 | `gardenlet`, `gardener-resource-manager`, `istio-ingressgateway`, `istiod` |
| `gardener-system-900` | 999998900 | Extensions, `reversed-vpn-auth-server` |
| `gardener-system-800` | 999998800 | `dependency-watchdog-endpoint`, `dependency-watchdog-probe`, `etcd-druid`, `(auditlog-)mutator`, `vpa-admission-controller` |
| `gardener-system-700` | 999998700 | `auditlog-seed-controller`, `hvpa-controller`, `vpa-recommender`, `vpa-updater` |
| `gardener-system-600` | 999998600 | `alertmanager-seed`, `fluent-operator`, `fluent-bit`, `plutono`, `kube-state-metrics`, `nginx-ingress-controller`, `nginx-k8s-backend`, `prometheus-operator`, `prometheus-aggregate`, `prometheus-cache`, `prometheus-seed`, `vali` |
| `gardener-reserve-excess-capacity` | -5 | `reserve-excess-capacity` ([ref](https://github.com/gardener/gardener/pull/6135)) |

### `PriorityClass`es for Shoot Control Plane Components

Expand Down
33 changes: 31 additions & 2 deletions docs/extensions/logging-and-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This guide is about the roles and extensibility options of the logging and monit
The central Prometheus instance in the `garden` namespace (called "cache Prometheus") fetches metrics and data from all seed cluster nodes and all seed cluster pods.
It uses the [federation](https://prometheus.io/docs/prometheus/latest/federation/) concept to allow the shoot-specific instances to scrape only the metrics for the pods of the control plane they are responsible for.
This mechanism allows to scrape the metrics for the nodes/pods once for the whole cluster, and to have them distributed afterwards.
For more details, continue reading [here](../development/monitoring-stack.md#overview).
For more details, continue reading [here](../monitoring/README.md#prometheus).

Typically, this is not necessary, but in case an extension wants to extend the configuration for this cache Prometheus, they can create the [`prometheus-operator`'s custom resources](https://github.com/prometheus-operator/prometheus-operator?tab=readme-ov-file#customresourcedefinitions) and label them with `prometheus=cache`, for example:

Expand Down Expand Up @@ -48,7 +48,7 @@ spec:
Another Prometheus instance in the `garden` namespace (called "seed Prometheus") fetches metrics and data from seed system components, kubelets, cAdvisors, and extensions.
If you want your extension pods to be scraped then they must be annotated with `prometheus.io/scrape=true` and `prometheus.io/port=<metrics-port>`.
For more details, continue reading [here](../development/monitoring-stack.md#overview).
For more details, continue reading [here](../monitoring/README.md#seed-prometheus).

Typically, this is not necessary, but in case an extension wants to extend the configuration for this seed Prometheus, they can create the [`prometheus-operator`'s custom resources](https://github.com/prometheus-operator/prometheus-operator?tab=readme-ov-file#customresourcedefinitions) and label them with `prometheus=seed`, for example:

Expand All @@ -73,6 +73,35 @@ spec:
port: metrics
```

### Aggregate Prometheus

Another Prometheus instance in the `garden` namespace (called "aggregate Prometheus") stores pre-aggregated data from the cache Prometheus and shoot Prometheis.
An ingress exposes this Prometheus instance allowing it to be scraped from another cluster.
For more details, continue reading [here](../monitoring/README.md#aggregate-prometheus).

Typically, this is not necessary, but in case an extension wants to extend the configuration for this aggregate Prometheus, they can create the [`prometheus-operator`'s custom resources](https://github.com/prometheus-operator/prometheus-operator?tab=readme-ov-file#customresourcedefinitions) and label them with `prometheus=aggregate`, for example:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
prometheus: aggregate
name: aggregate-my-component
namespace: garden
spec:
selector:
matchLabels:
app: my-component
endpoints:
- metricRelabelings:
- action: keep
regex: ^(metric1|metric2|...)$
sourceLabels:
- __name__
port: metrics
```

### Shoot Cluster Prometheus

The shoot-specific metrics are then made available to operators and users in the shoot Plutono, using the shoot Prometheus as data source.
Expand Down
7 changes: 1 addition & 6 deletions hack/test-prometheus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,7 @@ set -o pipefail

echo "> Test Prometheus"

echo "Executing Prometheus alert tests"
echo "Executing shoot Prometheus alert tests"
pushd "$(dirname $0)/../pkg/component/monitoring/charts/seed-monitoring/charts/core/charts/prometheus" > /dev/null
promtool test rules rules-tests/*test.yaml
popd > /dev/null

echo "Executing aggregate Prometheus alert tests"
pushd "$(dirname $0)/../pkg/component/monitoring/charts/bootstrap/aggregate-prometheus-rules-tests" > /dev/null
promtool test rules *test.yaml
popd > /dev/null
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ import (
extensionsv1alpha1 "github.com/gardener/gardener/pkg/apis/extensions/v1alpha1"
resourcesv1alpha1 "github.com/gardener/gardener/pkg/apis/resources/v1alpha1"
"github.com/gardener/gardener/pkg/client/kubernetes"
"github.com/gardener/gardener/pkg/component/logging/vali"
valiconstants "github.com/gardener/gardener/pkg/component/logging/vali/constants"
"github.com/gardener/gardener/pkg/features"
nodeagentv1alpha1 "github.com/gardener/gardener/pkg/nodeagent/apis/config/v1alpha1"
"github.com/gardener/gardener/pkg/utils"
Expand Down Expand Up @@ -224,7 +224,7 @@ func GenerateRBACResourcesData(secretNames []string) (map[string][]byte, error)
{
APIGroups: []string{""},
Resources: []string{"secrets"},
ResourceNames: append(secretNames, Name, vali.ValitailTokenSecretName,
ResourceNames: append(secretNames, Name, valiconstants.ValitailTokenSecretName,
// This is needed for migration from cloud-config-downloader to gardener-node-agent: The CCD
// token will be used to fetch the GNA token, hence it needs permissions to read the secret.
"gardener-node-agent",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import (
extensionsv1alpha1helper "github.com/gardener/gardener/pkg/apis/extensions/v1alpha1/helper"
"github.com/gardener/gardener/pkg/component/extensions/operatingsystemconfig/original/components"
"github.com/gardener/gardener/pkg/component/extensions/operatingsystemconfig/original/components/valitail"
"github.com/gardener/gardener/pkg/component/logging/vali"
valiconstants "github.com/gardener/gardener/pkg/component/logging/vali/constants"
nodeagentv1alpha1 "github.com/gardener/gardener/pkg/nodeagent/apis/config/v1alpha1"
"github.com/gardener/gardener/pkg/utils"
)
Expand Down Expand Up @@ -73,7 +73,7 @@ func (component) Config(ctx components.Context) ([]extensionsv1alpha1.Unit, []ex
var additionalTokenSyncConfigs []nodeagentv1alpha1.TokenSecretSyncConfig
if ctx.ValitailEnabled {
additionalTokenSyncConfigs = append(additionalTokenSyncConfigs, nodeagentv1alpha1.TokenSecretSyncConfig{
SecretName: vali.ValitailTokenSecretName,
SecretName: valiconstants.ValitailTokenSecretName,
Path: valitail.PathAuthToken,
})
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import (
bootstraptokenapi "k8s.io/cluster-bootstrap/token/api"

"github.com/gardener/gardener/pkg/client/kubernetes"
"github.com/gardener/gardener/pkg/component/logging/vali"
valiconstants "github.com/gardener/gardener/pkg/component/logging/vali/constants"
nodeagentv1alpha1 "github.com/gardener/gardener/pkg/nodeagent/apis/config/v1alpha1"
"github.com/gardener/gardener/pkg/utils/managedresources"
)
Expand Down Expand Up @@ -75,7 +75,7 @@ func RBACResourcesData(secretNames []string) (map[string][]byte, error) {
{
APIGroups: []string{""},
Resources: []string{"secrets"},
ResourceNames: append([]string{nodeagentv1alpha1.AccessSecretName, vali.ValitailTokenSecretName}, secretNames...),
ResourceNames: append([]string{nodeagentv1alpha1.AccessSecretName, valiconstants.ValitailTokenSecretName}, secretNames...),
Verbs: []string{"get", "list", "watch"},
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ import (
resourcesv1alpha1 "github.com/gardener/gardener/pkg/apis/resources/v1alpha1"
"github.com/gardener/gardener/pkg/component/extensions/operatingsystemconfig/downloader"
"github.com/gardener/gardener/pkg/component/extensions/operatingsystemconfig/original/components"
"github.com/gardener/gardener/pkg/component/logging/vali"
valiconstants "github.com/gardener/gardener/pkg/component/logging/vali/constants"
"github.com/gardener/gardener/pkg/features"
nodeagentv1alpha1 "github.com/gardener/gardener/pkg/nodeagent/apis/config/v1alpha1"
"github.com/gardener/gardener/pkg/utils"
Expand Down Expand Up @@ -200,7 +200,7 @@ func getFetchTokenScriptFile() (extensionsv1alpha1.File, error) {
"pathCredentialsCACert": downloader.PathCredentialsCACert,
"pathAuthToken": PathAuthToken,
"dataKeyToken": resourcesv1alpha1.DataKeyToken,
"secretName": vali.ValitailTokenSecretName,
"secretName": valiconstants.ValitailTokenSecretName,
}); err != nil {
return extensionsv1alpha1.File{}, err
}
Expand Down
26 changes: 19 additions & 7 deletions pkg/component/istio/istio_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ var _ = Describe("istiod", func() {
return strings.ReplaceAll(string(data), "<CHECKSUM>", checksum)
}

istiodServiceMonitor = func() string {
data, _ := os.ReadFile("./test_charts/istiod_servicemonitor.yaml")
return string(data)
}

istioIngressAutoscaler = func(min *int, max *int) string {
data, _ := os.ReadFile("./test_charts/ingress_autoscaler.yaml")
str := strings.ReplaceAll(string(data), "<MIN_REPLICAS>", strconv.Itoa(ptr.Deref(min, 2)))
Expand Down Expand Up @@ -224,6 +229,11 @@ var _ = Describe("istiod", func() {
return strings.ReplaceAll(string(data), "<REPLICAS>", strconv.Itoa(ptr.Deref(replicas, 2)))
}

istioIngressServiceMonitor = func() string {
data, _ := os.ReadFile("./test_charts/ingress_servicemonitor.yaml")
return string(data)
}

istioProxyProtocolEnvoyFilter = func() string {
data, _ := os.ReadFile("./test_charts/proxyprotocol_envoyfilter.yaml")
return string(data)
Expand Down Expand Up @@ -371,7 +381,7 @@ var _ = Describe("istiod", func() {

Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSecret), managedResourceIstioSecret)).To(Succeed())
Expect(managedResourceIstioSecret.Type).To(Equal(corev1.SecretTypeOpaque))
Expect(managedResourceIstioSecret.Data).To(HaveLen(14))
Expect(managedResourceIstioSecret.Data).To(HaveLen(15))
Expect(managedResourceIstioSecret.Immutable).To(Equal(ptr.To(true)))
Expect(managedResourceIstioSecret.Labels["resources.gardener.cloud/garbage-collectable-reference"]).To(Equal("true"))

Expand All @@ -383,6 +393,7 @@ var _ = Describe("istiod", func() {
Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_service_test-ingress.yaml"]), istioIngressService())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_serviceaccount_test-ingress.yaml"]), istioIngressServiceAccount())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_deployment_test-ingress.yaml"]), istioIngressDeployment(nil))).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSecret.Data["servicemonitor__istio-system__aggregate-istio-ingressgateway.yaml"]), istioIngressServiceMonitor())).To(BeEmpty())

By("Verify istio-proxy-protocol resources")
Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_proxy-protocol-envoyfilter_test-ingress.yaml"]), istioProxyProtocolEnvoyFilter())).To(BeEmpty())
Expand All @@ -400,7 +411,7 @@ var _ = Describe("istiod", func() {

Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSystemSecret), managedResourceIstioSystemSecret)).To(Succeed())
Expect(managedResourceIstioSystemSecret.Type).To(Equal(corev1.SecretTypeOpaque))
Expect(managedResourceIstioSystemSecret.Data).To(HaveLen(16))
Expect(managedResourceIstioSystemSecret.Data).To(HaveLen(17))
Expect(managedResourceIstioSystemSecret.Immutable).To(Equal(ptr.To(true)))
Expect(managedResourceIstioSystemSecret.Labels["resources.gardener.cloud/garbage-collectable-reference"]).To(Equal("true"))

Expand All @@ -421,6 +432,7 @@ var _ = Describe("istiod", func() {
Expect(diffConfig(string(managedResourceIstioSystemSecret.Data["istio-istiod_templates_serviceaccount.yaml"]), istiodServiceAccount())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSystemSecret.Data["istio-istiod_templates_autoscale.yaml"]), istiodAutoscale())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSystemSecret.Data["istio-istiod_templates_validatingwebhookconfiguration.yaml"]), istiodValidationWebhook())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSystemSecret.Data["servicemonitor__istio-system__aggregate-istiod.yaml"]), istiodServiceMonitor())).To(BeEmpty())
})

Context("kubernetes version < 1.26", func() {
Expand All @@ -446,12 +458,12 @@ var _ = Describe("istiod", func() {

It("should succesfully deploy pdb with the correct spec", func() {
Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSecret), managedResourceIstioSecret)).To(Succeed())
Expect(managedResourceIstioSecret.Data).To(HaveLen(14))
Expect(managedResourceIstioSecret.Data).To(HaveLen(15))

Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSystem), managedResourceIstioSystem)).To(Succeed())
managedResourceIstioSystemSecret.Name = managedResourceIstioSystem.Spec.SecretRefs[0].Name
Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSystemSecret), managedResourceIstioSystemSecret)).To(Succeed())
Expect(managedResourceIstioSystemSecret.Data).To(HaveLen(16))
Expect(managedResourceIstioSystemSecret.Data).To(HaveLen(17))

Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_poddisruptionbudget_test-ingress.yaml"]), istioIngressPodDisruptionBudgetLess126())).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSystemSecret.Data["istio-istiod_templates_poddisruptionbudget.yaml"]), istiodPodDisruptionBudgetLess126())).To(BeEmpty())
Expand Down Expand Up @@ -651,7 +663,7 @@ var _ = Describe("istiod", func() {
It("should successfully deploy all resources", func() {
Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSecret), managedResourceIstioSecret)).To(Succeed())
Expect(managedResourceIstioSecret.Type).To(Equal(corev1.SecretTypeOpaque))
Expect(managedResourceIstioSecret.Data).To(HaveLen(11))
Expect(managedResourceIstioSecret.Data).To(HaveLen(12))

Expect(string(managedResourceIstioSecret.Data["istio-ingress_templates_vpn-envoy-filter_test-ingress.yaml"])).To(BeEmpty())
Expect(diffConfig(string(managedResourceIstioSecret.Data["istio-ingress_templates_deployment_test-ingress.yaml"]), istioIngressDeployment(nil))).To(BeEmpty())
Expand Down Expand Up @@ -683,7 +695,7 @@ var _ = Describe("istiod", func() {
It("should successfully deploy all resources", func() {
Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSecret), managedResourceIstioSecret)).To(Succeed())
Expect(managedResourceIstioSecret.Type).To(Equal(corev1.SecretTypeOpaque))
Expect(managedResourceIstioSecret.Data).To(HaveLen(11))
Expect(managedResourceIstioSecret.Data).To(HaveLen(12))

Expect(string(managedResourceIstioSecret.Data["istio-ingress_templates_proxy-protocol-envoyfilter_test-ingress.yaml"])).To(BeEmpty())
Expect(string(managedResourceIstioSecret.Data["istio-ingress_templates_proxy-protocol-gateway_test-ingress.yaml"])).To(BeEmpty())
Expand Down Expand Up @@ -716,7 +728,7 @@ var _ = Describe("istiod", func() {
It("should successfully deploy all resources", func() {
Expect(c.Get(ctx, client.ObjectKeyFromObject(managedResourceIstioSecret), managedResourceIstioSecret)).To(Succeed())
Expect(managedResourceIstioSecret.Type).To(Equal(corev1.SecretTypeOpaque))
Expect(managedResourceIstioSecret.Data).To(HaveLen(11))
Expect(managedResourceIstioSecret.Data).To(HaveLen(12))

Expect(managedResourceIstioSecret.Data).ToNot(HaveKey("istio-istiod_templates_configmap.yaml"))
Expect(managedResourceIstioSecret.Data).ToNot(HaveKey("istio-istiod_templates_deployment.yaml"))
Expand Down
Loading

0 comments on commit 2416f88

Please sign in to comment.