diff --git a/docs/development/changing-the-api.md b/docs/development/changing-the-api.md index 749dd2cb6b9..3c079fd5dc2 100644 --- a/docs/development/changing-the-api.md +++ b/docs/development/changing-the-api.md @@ -1,37 +1,37 @@ --- -title: Changing the APIs +title: Changing the API --- -# Extending the API +# Changing the API This document describes the steps that need to be performed when changing the API. It provides guidance for API changes to both (Gardener system in general or component configurations). -Generally, as Gardener is a Kubernetes-native extension, it follows the same API conventions and guidelines like Kubernetes itself. -[This document](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) as well as [this document](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md) already provide a good overview and general explanation of the basic concepts behind it. +Generally, as Gardener is a Kubernetes-native extension, it follows the same API conventions and guidelines like Kubernetes itself. The Kubernetes +[API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) as well as [Changing the API](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md) topics already provide a good overview and general explanation of the basic concepts behind it. We are following the same approaches. ## Gardener API -The Gardener API is defined in `pkg/apis/{core,extensions,settings}` directories and is the main point of interaction with the system. +The Gardener API is defined in the `pkg/apis/{core,extensions,settings}` directories and is the main point of interaction with the system. It must be ensured that the API is always backwards-compatible. -#### Changing the API +### Changing the API **Checklist** when changing the API: -1. Modify the field(s) in the respective Golang files of all external and the internal version. +1. Modify the field(s) in the respective Golang files of all external versions and the internal version. 1. Make sure new fields are being added as "optional" fields, i.e., they are of pointer types, they have the `// +optional` comment, and they have the `omitempty` JSON tag. 1. Make sure that the existing field numbers in the protobuf tags are not changed. -1. If necessary then implement/adapt the conversion logic defined in the versioned APIs (e.g., `pkg/apis/core/v1beta1/conversions*.go`). -1. If necessary then implement/adapt defaulting logic defined in the versioned APIs (e.g., `pkg/apis/core/v1beta1/defaults*.go`). -1. Run the code generation: `make generate` -1. If necessary then implement/adapt validation logic defined in the internal API (e.g., `pkg/apis/core/validation/validation*.go`). -1. If necessary then adapt the exemplary YAML manifests of the Gardener resources defined in `example/*.yaml`. -1. In most cases it makes sense to add/adapt the documentation for administrators/operators and/or end-users in the `docs` folder to provide information on purpose and usage of the added/changed fields. -1. When opening the pull request then always add a release note so that end-users are becoming aware of the changes. +2. If necessary, implement/adapt the conversion logic defined in the versioned APIs (e.g., `pkg/apis/core/v1beta1/conversions*.go`). +3. If necessary, implement/adapt defaulting logic defined in the versioned APIs (e.g., `pkg/apis/core/v1beta1/defaults*.go`). +4. Run the code generation: `make generate` +5. If necessary, implement/adapt validation logic defined in the internal API (e.g., `pkg/apis/core/validation/validation*.go`). +6. If necessary, adapt the exemplary YAML manifests of the Gardener resources defined in `example/*.yaml`. +7. In most cases, it makes sense to add/adapt the documentation for administrators/operators and/or end-users in the `docs` folder to provide information on purpose and usage of the added/changed fields. +8. When opening the pull request, always add a release note so that end-users are becoming aware of the changes. -#### Removing a field +### Removing a Field If fields shall be removed permanently from the API, then a proper deprecation period must be adhered to so that end-users have enough time to adapt their clients. @@ -47,21 +47,21 @@ The steps for removing a field from the code base is: + // SeedTemplate *gardencorev1beta1.SeedTemplate `json:"seedTemplate,omitempty" protobuf:"bytes,2,opt,name=seedTemplate"` ``` - The reasoning behind this is to prevent the same protobuf number to be used by a new field. Introducing a new field with the same protobuf number would be a breaking change for clients still using the old protobuf definitions that have the old field for the given protobuf number. + The reasoning behind this is to prevent the same protobuf number being used by a new field. Introducing a new field with the same protobuf number would be a breaking change for clients still using the old protobuf definitions that have the old field for the given protobuf number. The field in the internal version can be removed. -2. Unit test has to be added to make sure that a new field does not reuse the already reserved protobuf tag. +2. A unit test has to be added to make sure that a new field does not reuse the already reserved protobuf tag. -Example of field removal can be found in https://github.com/gardener/gardener/pull/6972. +Example of field removal can be found in the [Remove `seedTemplate` field from ManagedSeed API](https://github.com/gardener/gardener/pull/6972) PR. -## Component configuration APIs +## Component Configuration APIs Most Gardener components have a component configuration that follows similar principles to the Gardener API. Those component configurations are defined in `pkg/{controllermanager,gardenlet,scheduler},pkg/apis/config`. Hence, the above checklist also applies for changes to those APIs. -However, since these APIs are only used internally and only during the deployment of Gardener the guidelines with respect to changes and backwards-compatibility are slightly relaxed. -If necessary then it is allowed to remove fields without a proper deprecation period if the release note uses the `breaking operator` keywords. +However, since these APIs are only used internally and only during the deployment of Gardener, the guidelines with respect to changes and backwards-compatibility are slightly relaxed. +If necessary, it is allowed to remove fields without a proper deprecation period if the release note uses the `breaking operator` keywords. In addition to the above checklist: -1. If necessary then adapt the Helm chart of Gardener defined in `charts/gardener`. Adapt the `values.yaml` file as well as the manifest templates. +1. If necessary, then adapt the Helm chart of Gardener defined in `charts/gardener`. Adapt the `values.yaml` file as well as the manifest templates. diff --git a/docs/development/component-checklist.md b/docs/development/component-checklist.md index 80c706d092c..35f1d35b15c 100644 --- a/docs/development/component-checklist.md +++ b/docs/development/component-checklist.md @@ -1,15 +1,15 @@ # Checklist For Adding New Components -Adding new components which run in garden, seed or shoot cluster is theoretically quite simple - we just need a `Deployment` (or similar other workload resource), the respective container image and maybe a bit of configuration. -In practice however, there are a couple of things to keep in mind in order to make the deployment production-ready. -This document provides a checklist for them which you can walk through. +Adding new components that run in the garden, seed, or shoot cluster is theoretically quite simple - we just need a `Deployment` (or other similar workload resource), the respective container image, and maybe a bit of configuration. +In practice, however, there are a couple of things to keep in mind in order to make the deployment production-ready. +This document provides a checklist for them that you can walk through. ## General 1. **Avoid usage of Helm charts** ([example](https://github.com/gardener/gardener/tree/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver)) Nowadays, we use [Golang components](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/interfaces.go) instead of Helm charts for deploying components to a cluster. - Please find a typical structure of such components [here](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L80-L97) (configuration values are typically managed in a `Values` structure). + Please find a typical structure of such components in the provided [metrics_server.go](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L80-L97) file (configuration values are typically managed in a `Values` structure). There are a few exceptions (e.g., [Istio](https://github.com/gardener/gardener/tree/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/istio)) still using charts, however the default should be using a Golang-based implementation. For the exceptional cases, use Golang's [embed](https://pkg.go.dev/embed) package to embed the Helm chart directory ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/istio/istiod.go#L51-L52), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/istio/istiod.go#L257-L273)). @@ -17,7 +17,7 @@ This document provides a checklist for them which you can walk through. For historic reasons, resources related to shoot control plane components are applied directly with the client. All other resources (seed or shoot system components) are deployed via `gardener-resource-manager`'s [Resource controller](../concepts/resource-manager.md#managedresource-controller) (`ManagedResource`s) since it performs health checks out-of-the-box and has a lot of other features (see its documentation for more information). - Components which can run as both seed system component or shoot control plane component (e.g., VPA or `kube-state-metrics`) can make use of [these utility functions](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/resourceconfig.go). + Components that can run as both seed system component or shoot control plane component (e.g., VPA or `kube-state-metrics`) can make use of [these utility functions](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/resourceconfig.go). 3. **Do not hard-code container image references** ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/charts/images.yaml#L130-L133), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/metricsserver.go#L28-L31), [example 3](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L82-L83)) @@ -40,7 +40,7 @@ This document provides a checklist for them which you can walk through. You should use the [secrets manager](secrets_management.md) for the management of any kind of credentials. This makes sure that credentials rotation works out-of-the-box without you requiring to think about it. - Generally, do not use client certificates (see [security section](#security)). + Generally, do not use client certificates (see the [Security section](#security)). 6. **Consider hibernation when calculating replica count** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/kubescheduler.go#L36)) @@ -54,35 +54,35 @@ This document provides a checklist for them which you can walk through. 8. **Handle shoot system components** - Shoot system components deployed by `gardener-resource-manager` are labelled with `resource.gardener.cloud/managed-by: gardener`. This makes Gardener adding required label selectors and tolerations so that non-`DaemonSet` managed `Pod`s will exclusively run on selected nodes, [more information](../concepts/resource-manager.md#system-components-webhook). + Shoot system components deployed by `gardener-resource-manager` are labelled with `resource.gardener.cloud/managed-by: gardener`. This makes Gardener adding required label selectors and tolerations so that non-`DaemonSet` managed `Pod`s will exclusively run on selected nodes (for more information, see [System Components Webhook](../concepts/resource-manager.md#system-components-webhook)). `DaemonSet`s on the other hand, should generally tolerate any `NoSchedule` or `NoExecute` taints so that they can run on any `Node`, regardless of user added taints. ## Security 1. **Use a [dedicated `ServiceAccount`](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) and disable auto-mount** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L145-L151)) - Components which need to talk to the API server of their runtime cluster must always use a dedicated `ServiceAccount` (do not use `default`) which `automountServiceAccountToken` set to `false`. - This makes `gardener-resource-manager`'s [TokenInvalidator](../concepts/resource-manager.md#tokeninvalidator) invalidating the static token secret and its [`ProjectedTokenMount` webhook](../concepts/resource-manager.md#auto-mounting-projected-serviceaccount-tokens) injecting a projected token automatically. + Components that need to talk to the API server of their runtime cluster must always use a dedicated `ServiceAccount` (do not use `default`), with `automountServiceAccountToken` set to `false`. + This makes `gardener-resource-manager`'s [TokenInvalidator](../concepts/resource-manager.md#tokeninvalidator) invalidate the static token secret and its [`ProjectedTokenMount` webhook](../concepts/resource-manager.md#auto-mounting-projected-serviceaccount-tokens) inject a projected token automatically. 2. **Use shoot access tokens instead of a client certificates** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/kubescheduler/kube_scheduler.go#L227-L229)) - Components which need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) then the `gardener-resource-manager`'s [TokenRequestor](../concepts/resource-manager.md#tokenrequestor) should be used to manage a so-called "shoot access token". + For components that need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) the `gardener-resource-manager`'s [TokenRequestor](../concepts/resource-manager.md#tokenrequestor) should be used to manage a so-called "shoot access token". 3. **Define RBAC roles with minimal privileges** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L153-L223)) - The component's `ServiceAccount` (if exists) should have as little privileges as possible. + The component's `ServiceAccount` (if it exists) should have as little privileges as possible. Consequently, please define proper [RBAC roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) for it. This might include a combination of `ClusterRole`s and `Role`s. - Please do not provide elevated privileges due to laziness (e.g., because there is already a `ClusterRole` that can be extended vs. creating a `Role` only when only access to a single namespace is needed). + Please do not provide elevated privileges due to laziness (e.g., because there is already a `ClusterRole` that can be extended vs. creating a `Role` only when access to a single namespace is needed). 4. **Use [`NetworkPolicy`s](https://kubernetes.io/docs/concepts/services-networking/network-policies/) to restrict network traffic** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/etcd/etcd.go#L293-L339)) You should restrict both ingress and egress traffic to/from your component as much as possible to ensure that it only gets access to/from other components if really needed. - Gardener provides a few default policies for typical usage scenarios, please see [this document for seed clusters](seed_network_policies.md) and [this document for shoot clusters](../usage/shoot_network_policies.md). + Gardener provides a few default policies for typical usage scenarios. For more information, see [Seed Network Policies](seed_network_policies.md) and [Shoot Network Policies](../usage/shoot_network_policies.md). 5. **Do not run components in privileged mode** ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/nodelocaldns/nodelocaldns.go#L329-L333), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/nodelocaldns/nodelocaldns.go#L507)) - Avoid running components with `privileged=true` and define the needed [Linux capabilities](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container) instead. + Avoid running components with `privileged=true`. Instead, define the needed [Linux capabilities](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container). 6. **Choose the proper Seccomp profile** ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/nodelocaldns/nodelocaldns.go#L285-L287), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/nginxingress/nginxingress.go#L427)) @@ -93,14 +93,14 @@ This document provides a checklist for them which you can walk through. `PodSecurityPolicy`s are deprecated, however Gardener still supports shoot clusters with older Kubernetes versions ([ref](../usage/supported_k8s_versions.md)). To make sure that such clusters can run with `.spec.kubernetes.allowPrivilegedContainers=false`, you have to define proper `PodSecurityPolicy`s. - See also [this document](../usage/pod-security.md) for more information. + For more information, see [Pod Security](../usage/pod-security.md). ## High Availability / Stability 1. **Specify the component type label for high availability** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/kubescheduler/kube_scheduler.go#L234)) To support high-availability deployments, `gardener-resource-manager`s [HighAvailabilityConfig](../concepts/resource-manager.md#high-availability-config) webhook injects the proper specification like replica or topology spread constraints. - You only need to specify the type label, see also [this document](high-availability.md) for more information. + You only need to specify the type label. For more information, see [High Availability Of Deployed Components](high-availability.md). 2. **Define a `PodDisruptionBudget`** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L398-L422)) @@ -109,7 +109,7 @@ This document provides a checklist for them which you can walk through. 3. **Choose the right `PriorityClass`** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/kubescheduler/kube_scheduler.go#L301)) Each cluster runs many components with different priorities. - Gardener provides a set of default [`PriorityClass`es](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass), see [this document](priority-classes.md) for more information. + Gardener provides a set of default [`PriorityClass`es](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass). For more information, see [Priority Classes](priority-classes.md). 4. **Consider defining liveness and readiness probes** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L335-L358)) @@ -120,17 +120,17 @@ This document provides a checklist for them which you can walk through. 1. **Provide resource requirements** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L359-L367)) All components should have [resource requirements](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits). - Generally, they should always request CPU and memory while only memory shall be limited (no CPU limits!). + Generally, they should always request CPU and memory, while only memory shall be limited (no CPU limits!). 2. **Define a `VerticalPodAutoscaler`** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L424-L460)) We typically perform vertical auto-scaling via the VPA managed by the [Kubernetes community](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). - Each component should have a respective `VerticalPodAutoscaler` which "min allowed" resources, "auto update mode", and "requests only"-mode. - VPA is always enabled in garden or seed clusters while it is optional for shoot clusters. + Each component should have a respective `VerticalPodAutoscaler` with "min allowed" resources, "auto update mode", and "requests only"-mode. + VPA is always enabled in garden or seed clusters, while it is optional for shoot clusters. 3. **Define a `HorizontalPodAutoscaler` if needed** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/coredns/coredns.go#L689-L738)) - If your component is capable of scaling horizontally, the definition of a [`HorizontalPodAutoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) should be considered. + If your component is capable of scaling horizontally, you should consider defining a [`HorizontalPodAutoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/). ## Observability / Operations Productivity @@ -138,13 +138,13 @@ This document provides a checklist for them which you can walk through. Components should provide scrape configuration and alerting rules for Prometheus/Alertmanager if appropriate. This should be done inside a dedicated `monitoring.go` file. - Extensions should follow [this document](../extensions/logging-and-monitoring.md#extensions-monitoring-integration). + Extensions should follow the guidelines described in [Extensions Monitoring Integration](../extensions/logging-and-monitoring.md#extensions-monitoring-integration). 2. **Provide logging parsers and filters** ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/coredns/logging.go), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/gardenlet/controller/seed/seed/reconciler_reconcile.go#L563)) - Components should provide parsers and filters for fluent-bit if appropriate. + Components should provide parsers and filters for fluent-bit, if appropriate. This should be done inside a dedicated `logging.go` file. - Extensions should follow [this document](../extensions/logging-and-monitoring.md#fluent-bit-log-parsers-and-filters). + Extensions should follow the guidelines described in [Fluent-bit log parsers and filters](../extensions/logging-and-monitoring.md#fluent-bit-log-parsers-and-filters). 3. **Set the `revisionHistoryLimit` to `2` for `Deployment`s** ([example](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/metricsserver/metrics_server.go#L273)) @@ -157,5 +157,5 @@ This document provides a checklist for them which you can walk through. 5. **Configure automatic restarts in shoot maintenance time window** ([example 1](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/component/kubescheduler/kube_scheduler.go#L243), [example 2](https://github.com/gardener/gardener/blob/6a0fea86850ffec8937d1956bdf1a8ca6d074f3b/pkg/operation/botanist/coredns.go#L90-L107)) - Gardener offers to restart components during the maintenance time window, see [this document](../usage/shoot_maintenance.md#restart-control-plane-controllers) and [this document](../usage/shoot_maintenance.md#restart-some-core-addons). + Gardener offers to restart components during the maintenance time window. For more information, see [Restart Control Plane Controllers](../usage/shoot_maintenance.md#restart-control-plane-controllers) and [Restart Some Core Addons](../usage/shoot_maintenance.md#restart-some-core-addons). You can consider adding the needed label to your control plane component to get this automatic restart (probably not needed for most components). diff --git a/docs/development/dependencies.md b/docs/development/dependencies.md index f57b26fbb20..9bc423a6112 100644 --- a/docs/development/dependencies.md +++ b/docs/development/dependencies.md @@ -5,15 +5,15 @@ In order to add a new package dependency to the project, you can perform `go get ## Updating Dependencies -The `Makefile` contains a rule called `revendor` which performs `go mod tidy` and `go mod vendor`. -`go mod tidy` makes sure go.mod matches the source code in the module. It adds any missing modules necessary to build the current module's packages and dependencies, and it removes unused modules that don't provide any relevant packages. -`go mod vendor` resets the main module's vendor directory to include all packages needed to build and test all the main module's packages. It does not include test code for vendored packages. +The `Makefile` contains a rule called `revendor` which performs `go mod tidy` and `go mod vendor`: +- `go mod tidy` makes sure `go.mod` matches the source code in the module. It adds any missing modules necessary to build the current module's packages and dependencies, and it removes unused modules that don't provide any relevant packages. +- `go mod vendor` resets the main module's vendor directory to include all packages needed to build and test all the main module's packages. It does not include test code for vendored packages. ```bash make revendor ``` -The dependencies are installed into the `vendor` folder which **should be added** to the VCS. +The dependencies are installed into the `vendor` folder, which **should be added** to the VCS. :warning: Make sure that you test the code after you have updated the dependencies! @@ -26,7 +26,7 @@ For example: - Library for building Gardener extensions: `extensions` - Gardener's Test Framework: `test/framework` -There are a few more folders in this repository (non-Go sources) that are reused across projects in the gardener organization: +There are a few more folders in this repository (non-Go sources) that are reused across projects in the Gardener organization: - GitHub templates: `.github` - Concourse / cc-utils related helpers: `hack/.ci` @@ -44,7 +44,7 @@ Currently, we don't have a mechanism yet for selectively syncing out these expor ## Import Restrictions -We want to make sure, that other projects can depend on this repository's "exported" packages without pulling in the entire repository (including "non-exported" packages) or a high number of other unwanted dependencies. +We want to make sure that other projects can depend on this repository's "exported" packages without pulling in the entire repository (including "non-exported" packages) or a high number of other unwanted dependencies. Hence, we have to be careful when adding new imports or references between our packages. > ℹ️ General rule of thumb: the mentioned "exported" packages should be as self-contained as possible and depend on as few other packages in the repository and other projects as possible. @@ -52,8 +52,9 @@ Hence, we have to be careful when adding new imports or references between our p In order to support that rule and automatically check compliance with that goal, we leverage [import-boss](https://github.com/kubernetes/code-generator/tree/master/cmd/import-boss). The tool checks all imports of the given packages (including transitive imports) against rules defined in `.import-restrictions` files in each directory. An import is allowed if it matches at least one allowed prefix and does not match any forbidden prefixes. -Note: `''` (the empty string) is a prefix of everything. -For more details, see: https://github.com/kubernetes/code-generator/tree/master/cmd/import-boss + +> Note: `''` (the empty string) is a prefix of everything. +For more details, see the [import-boss](https://github.com/kubernetes/code-generator/tree/master/cmd/import-boss/README.md) topic. `import-boss` is executed on every pull request and blocks the PR if it doesn't comply with the defined import restrictions. You can also run it locally using `make check`. diff --git a/docs/development/getting_started_locally.md b/docs/development/getting_started_locally.md index 64cfca52680..ee4bcd7b5d4 100644 --- a/docs/development/getting_started_locally.md +++ b/docs/development/getting_started_locally.md @@ -33,7 +33,7 @@ The Gardener components, however, will be run as regular processes on your machi and reload the terminal. -## Setting up the KinD cluster (garden and seed) +## Setting Up the KinD Cluster (Garden and Seed) ```bash make kind-up KIND_ENV=local @@ -55,7 +55,9 @@ With this, mirrored images don't have to be pulled again after recreating the cl The command also deploys a default [calico](https://github.com/projectcalico/calico) installation as the cluster's CNI implementation with `NetworkPolicy` support (the default `kindnet` CNI doesn't provide `NetworkPolicy` support). Furthermore, it deploys the [metrics-server](https://github.com/kubernetes-sigs/metrics-server) in order to support HPA and VPA on the seed cluster. -## Setting up Gardener +## Setting Up Gardener + +In a terminal pane, run: ```bash make dev-setup # preparing the environment (without webhooks for now) @@ -63,34 +65,34 @@ kubectl wait --for=condition=ready pod -l run=etcd -n garden --timeout 2m # make start-apiserver # starting gardener-apiserver ``` -In a new terminal pane, run +In a new terminal pane, run: ```bash kubectl wait --for=condition=available apiservice v1beta1.core.gardener.cloud # wait for gardener-apiserver to be ready make start-admission-controller # starting gardener-admission-controller ``` -In a new terminal pane, run +In a new terminal pane, run: ```bash make dev-setup DEV_SETUP_WITH_WEBHOOKS=true # preparing the environment with webhooks make start-controller-manager # starting gardener-controller-manager ``` -(Optional): In a new terminal pane, run +(Optional): In a new terminal pane, run: ```bash make start-scheduler # starting gardener-scheduler ``` -In a new terminal pane, run +In a new terminal pane, run: ```bash make register-local-env # registering the local environment (CloudProfile, Seed, etc.) make start-gardenlet SEED_NAME=local # starting gardenlet ``` -In a new terminal pane, run +In a new terminal pane, run: ```bash make start-extension-provider-local # starting gardener-extension-provider-local @@ -98,9 +100,9 @@ make start-extension-provider-local # ℹ️ The [`provider-local`](../extensions/provider-local.md) is started with elevated privileges since it needs to manipulate your `/etc/hosts` file to enable you accessing the created shoot clusters from your local machine, see [this](../extensions/provider-local.md#dnsrecord) for more details. -## Creating a `Shoot` cluster +## Creating a `Shoot` Cluster -You can wait for the `Seed` to be ready by running +You can wait for the `Seed` to become ready by running: ```bash kubectl wait --for=condition=gardenletready --for=condition=extensionsready --for=condition=bootstrapped seed local --timeout=5m @@ -113,13 +115,13 @@ NAME STATUS PROVIDER REGION AGE VERSION K8S VERSION local Ready local local 4m42s vX.Y.Z-dev v1.21.1 ``` -In order to create a first shoot cluster, just run +In order to create a first shoot cluster, just run: ```bash kubectl apply -f example/provider-local/shoot.yaml ``` -You can wait for the `Shoot` to be ready by running +You can wait for the `Shoot` to be ready by running: ```bash kubectl wait --for=condition=apiserveravailable --for=condition=controlplanehealthy --for=condition=observabilitycomponentshealthy --for=condition=everynodeready --for=condition=systemcomponentshealthy shoot local -n garden-local --timeout=10m @@ -132,7 +134,7 @@ NAME CLOUDPROFILE PROVIDER REGION K8S VERSION HIBERNATION LAST OPER local local local local 1.21.0 Awake Create Processing (43%) healthy 94s ``` -(Optional): You could also execute a simple e2e test (creating and deleting a shoot) by running +(Optional): You could also execute a simple e2e test (creating and deleting a shoot) by running: ```shell make test-e2e-local-simple KUBECONFIG="$PWD/example/gardener-local/kind/local/kubeconfig" @@ -145,7 +147,7 @@ kubectl -n garden-local get secret local.kubeconfig -o jsonpath={.data.kubeconfi kubectl --kubeconfig=/tmp/kubeconfig-shoot-local.yaml get nodes ``` -## (Optional): Setting up a second seed cluster +## (Optional): Setting Up a Second Seed Cluster There are cases where you would want to create a second seed cluster in your local setup. For example, if you want to test the [control plane migration](../usage/control_plane_migration.md) feature. The following steps describe how to do that. @@ -168,7 +170,7 @@ make register-kind2-env # registering make start-gardenlet SEED_NAME=local2 # starting gardenlet for the local2 seed ``` -In a new terminal pane, run +In a new terminal pane, run: ```bash export KUBECONFIG=./example/gardener-local/kind/local2/kubeconfig # setting KUBECONFIG to point to second kind cluster @@ -180,29 +182,29 @@ make start-extension-provider-local \ HEALTH_BIND_ADDRESS=:8083 # starting gardener-extension-provider-local ``` -If you want to perform a control plane migration you can follow the steps outlined [here](../usage/control_plane_migration.md) to migrate the shoot cluster to the second seed you just created. +If you want to perform a control plane migration you can follow the steps outlined in the [Control Plane Migration](../usage/control_plane_migration.md) topic to migrate the shoot cluster to the second seed you just created. -## Deleting the `Shoot` cluster +## Deleting the `Shoot` Cluster ```shell ./hack/usage/delete shoot local garden-local ``` -## (Optional): Tear down the second seed cluster +## (Optional): Tear Down the Second Seed Cluster ```bash make tear-down-kind2-env make kind2-down ``` -## Tear down the Gardener environment +## Tear Down the Gardener Environment ```shell make tear-down-local-env make kind-down ``` -## Remote local setup +## Remote Local Setup Just like Prow is executing the KinD based integration tests in a K8s pod, it is possible to interactively run this KinD based Gardener development environment @@ -220,7 +222,7 @@ tmux -u a Please refer to the [TMUX documentation](https://github.com/tmux/tmux/wiki) for working effectively inside the remote-local-setup pod. -To access Grafana, Prometheus or other components in a browser, two port forwards are needed: +To access Grafana, Prometheus, or other components in a browser, two port forwards are needed: The port forward from the laptop to the pod: @@ -234,6 +236,6 @@ The port forward in the remote-local-setup pod to the respective component: k port-forward -n shoot--local--local deployment/grafana-operators 3000 ``` -## Further reading +## Related Links -This setup makes use of the local provider extension. You can read more about it in [this document](../extensions/provider-local.md). +- [Local Provider Extension](../extensions/provider-local.md) diff --git a/docs/development/high-availability.md b/docs/development/high-availability.md index 7accb574f1a..3b93c49a2fb 100644 --- a/docs/development/high-availability.md +++ b/docs/development/high-availability.md @@ -1,15 +1,15 @@ -# High Availability Of Deployed Components +# High Availability of Deployed Components -`gardenlet` and extension controllers are deploying components via `Deployment`s, `StatefulSet`s, etc. as part of the shoot control plane, or the seed or shoot system components. +`gardenlet`s and extension controllers are deploying components via `Deployment`s, `StatefulSet`s, etc. as part of the shoot control plane, or the seed or shoot system components. Some of the above component deployments must be further tuned to improve fault tolerance / resilience of the service. This document outlines what needs to be done to achieve this goal. -Please be forwarded to [this section](#convenient-application-of-these-rules), if you want to take a shortcut to the list of actions that require developers' attention. +Please be forwarded to the [Convenient Application Of These Rules](#convenient-application-of-these-rules) section, if you want to take a shortcut to the list of actions that require developers' attention. ## Seed Clusters The worker nodes of seed clusters can be deployed to one or multiple availability zones. -The `Seed` specification allows to provide the information which zones are available: +The `Seed` specification allows you to provide the information which zones are available: ```yaml spec: @@ -21,7 +21,7 @@ spec: - europe-1c ``` -Independent of the number of zones, seed system components like `gardenlet` or extension controllers themselves, or others like `etcd-druid`, `dependency-watchdog`, etc. should always be running with multiple replicas. +Independent of the number of zones, seed system components like the `gardenlet` or the extension controllers themselves, or others like `etcd-druid`, `dependency-watchdog`, etc. should always be running with multiple replicas. Concretely, all seed system components should respect the following conventions: @@ -42,7 +42,7 @@ Concretely, all seed system components should respect the following conventions: When the component has `>= 2` replicas ... - - ... then it should also have a `topologySpreadConstraint` ensuring the replicas are spread over the nodes: + - ... then it should also have a `topologySpreadConstraint`, ensuring the replicas are spread over the nodes: ```yaml spec: @@ -55,7 +55,7 @@ Concretely, all seed system components should respect the following conventions: Hence, the node spread is done on best-effort basis only. - - ... and the seed cluster has `>= 2` zones, then the component should also have a second `topologySpreadConstraint` ensuring the replicas are spread over the zones: + - ... and the seed cluster has `>= 2` zones, then the component should also have a second `topologySpreadConstraint`, ensuring the replicas are spread over the zones: ```yaml spec: @@ -71,7 +71,7 @@ Concretely, all seed system components should respect the following conventions: ## Shoot Clusters -The `Shoot` specification allows configuring "high availability" as well as the failure tolerance type for the control plane components, see [this document](../usage/shoot_high_availability.md) for details. +The `Shoot` specification allows configuring "high availability" as well as the failure tolerance type for the control plane components, see [Highly Available Shoot Control Plane](../usage/shoot_high_availability.md) for details. Regarding the seed cluster selection, the only constraint is that shoot clusters with failure tolerance type `zone` are only allowed to run on seed clusters with at least three zones. All other shoot clusters (non-HA or those with failure tolerance type `node`) can run on seed clusters with any number of zones. @@ -91,7 +91,7 @@ All control plane components should respect the following conventions: Apart from the above, there might be special cases where these rules do not apply, for example: - `etcd` is a server, though the most critical component of a cluster requiring a quorum to survive failures. Hence, it should have `3` replicas even when the failure tolerance is `node` only. - - `kube-apiserver` is scaled horizontally, hence the above numbers are the minimum values (even when the shoot cluster is not HA there might be multiple replicas). + - `kube-apiserver` is scaled horizontally, hence the above numbers are the minimum values (even when the shoot cluster is not HA, there might be multiple replicas). - **Topology Spread Constraints** @@ -127,8 +127,8 @@ All control plane components should respect the following conventions: The `gardenlet` annotates the shoot namespace in the seed cluster with the `high-availability-config.resources.gardener.cloud/zones` annotation. - - If the shoot cluster is non-HA or has failure tolerance type `node` then the value will be always exactly one zone (e.g., `high-availability-config.resources.gardener.cloud/zones=europe-1b`). - - If the shoot cluster has failure tolerance type `zone` then the value will always contain exactly three zones (e.g., `high-availability-config.resources.gardener.cloud/zones=europe-1a,europe-1b,europe-1c`). + - If the shoot cluster is non-HA or has failure tolerance type `node`, then the value will be always exactly one zone (e.g., `high-availability-config.resources.gardener.cloud/zones=europe-1b`). + - If the shoot cluster has failure tolerance type `zone`, then the value will always contain exactly three zones (e.g., `high-availability-config.resources.gardener.cloud/zones=europe-1a,europe-1b,europe-1c`). For backwards-compatibility, this annotation might contain multiple zones for shoot clusters created before `gardener/gardener@v1.60` and not having failure tolerance type `zone`. This is because their volumes might already exist in multiple zones, hence pinning them to only one zone would not work. @@ -153,7 +153,7 @@ All control plane components should respect the following conventions: ### System Components -The availability of system components is independent of the control plane since they run on the shoot worker nodes while the control plane components run on the seed worker nodes ([ref](../concepts/architecture.md)). +The availability of system components is independent of the control plane since they run on the shoot worker nodes while the control plane components run on the seed worker nodes (for more information, see the [Kubernetes architecture overview](../concepts/architecture.md)). Hence, it only depends on the number of availability zones configured in the shoot worker pools via `.spec.provider.workers[].zones`. Concretely, the highest number of zones of a worker pool with `systemComponents.allow=true` is considered. @@ -199,7 +199,7 @@ All system components should respect the following conventions: matchLabels: ... ``` -## Convenient Application Of These Rules +## Convenient Application of These Rules According to above scenarios and conventions, the `replicas`, `topologySpreadConstraints` or `affinity` settings of the deployed components might need to be adapted. @@ -217,30 +217,30 @@ spec: matchLabels: ... ``` -3. Add label `high-availability-config.resources.gardener.cloud/type` to `deployment`s or `statefulset`s as well as optionally involved `horizontalpodautoscaler`s or `HVPA`s where the following two values are possible: +3. Add the label `high-availability-config.resources.gardener.cloud/type` to `deployment`s or `statefulset`s, as well as optionally involved `horizontalpodautoscaler`s or `HVPA`s where the following two values are possible: - `controller` - `server` Type `server` is also preferred if a component is a controller and (webhook) server at the same time. -You can read more about the webhook's internals in [this document](../concepts/resource-manager.md#high-availability-config). +You can read more about the webhook's internals in [High Availability Config](../concepts/resource-manager.md#high-availability-config). ## `gardenlet` Internals -Make sure you have read above document about the webhook internals before continuing reading this section. +Make sure you have read the above document about the webhook internals before continuing reading this section. ### `Seed` Controller The `gardenlet` performs the following changes on all namespaces running seed system components: -- add label `high-availability-config.resources.gardener.cloud/consider=true`. -- add annotation `high-availability-config.resources.gardener.cloud/zones=` where `` is the list provided in `.spec.provider.zones[]` in the `Seed` specification. +- adds the label `high-availability-config.resources.gardener.cloud/consider=true`. +- adds the annotation `high-availability-config.resources.gardener.cloud/zones=`, where `` is the list provided in `.spec.provider.zones[]` in the `Seed` specification. -Note that neither the `high-availability-config.resources.gardener.cloud/failure-tolerance-type` nor the `high-availability-config.resources.gardener.cloud/zone-pinning` annotations are set, hence the node affinity would never be touched by the webhook. +Note that neither the `high-availability-config.resources.gardener.cloud/failure-tolerance-type`, nor the `high-availability-config.resources.gardener.cloud/zone-pinning` annotations are set, hence the node affinity would never be touched by the webhook. -The only exception to this rule are the istio ingress gateway namespaces. This includes the default istio ingress gateway when SNI is enabled as well as analogous namespaces for exposure classes and zone-specific istio ingress gateways. Those namespaces -will additionally be annotated with `high-availability-config.resources.gardener.cloud/zone-pinning` set to `true` resulting in the node affinities and the topology spread constraints being set. The replicas are not touched as the istio ingress gateways +The only exception to this rule are the istio ingress gateway namespaces. This includes the default istio ingress gateway when SNI is enabled, as well as analogous namespaces for exposure classes and zone-specific istio ingress gateways. Those namespaces +will additionally be annotated with `high-availability-config.resources.gardener.cloud/zone-pinning` set to `true`, resulting in the node affinities and the topology spread constraints being set. The replicas are not touched, as the istio ingress gateways are scaled by a horizontal autoscaler instance. ### `Shoot` Controller @@ -249,9 +249,9 @@ are scaled by a horizontal autoscaler instance. The `gardenlet` performs the following changes on the namespace running the shoot control plane components: -- add label `high-availability-config.resources.gardener.cloud/consider=true`. This makes the webhook mutate the replica count and the topology spread constraints. -- add annotation `high-availability-config.resources.gardener.cloud/failure-tolerance-type` with value equal to `.spec.controlPlane.highAvailability.failureTolerance.type` (or `""`, if `.spec.controlPlane.highAvailability=nil`). This makes the webhook mutate the node affinity according to the specified zone(s). -- add annotation `high-availability-config.resources.gardener.cloud/zones=` where `` is a ... +- adds the label `high-availability-config.resources.gardener.cloud/consider=true`. This makes the webhook mutate the replica count and the topology spread constraints. +- adds the annotation `high-availability-config.resources.gardener.cloud/failure-tolerance-type` with value equal to `.spec.controlPlane.highAvailability.failureTolerance.type` (or `""`, if `.spec.controlPlane.highAvailability=nil`). This makes the webhook mutate the node affinity according to the specified zone(s). +- adds the annotation `high-availability-config.resources.gardener.cloud/zones=`, where `` is a ... - ... random zone chosen from the `.spec.provider.zones[]` list in the `Seed` specification (always only one zone (even if there are multiple available in the seed cluster)) in case the `Shoot` has no HA setting (i.e., `spec.controlPlane.highAvailability=nil`) or when the `Shoot` has HA setting with failure tolerance type `node`. - ... list of three randomly chosen zones from the `.spec.provider.zones[]` list in the `Seed` specification in case the `Shoot` has HA setting with failure tolerance type `zone`. @@ -259,7 +259,7 @@ The `gardenlet` performs the following changes on the namespace running the shoo The `gardenlet` performs the following changes on all namespaces running shoot system components: -- add label `high-availability-config.resources.gardener.cloud/consider=true`. This makes the webhook mutate the replica count and the topology spread constraints. -- add annotation `high-availability-config.resources.gardener.cloud/zones=` where `` is the merged list of zones provided in `.zones[]` with `systemComponents.allow=true` for all worker pools in `.spec.provider.workers[]` in the `Shoot` specification. +- adds the label `high-availability-config.resources.gardener.cloud/consider=true`. This makes the webhook mutate the replica count and the topology spread constraints. +- adds the annotation `high-availability-config.resources.gardener.cloud/zones=` where `` is the merged list of zones provided in `.zones[]` with `systemComponents.allow=true` for all worker pools in `.spec.provider.workers[]` in the `Shoot` specification. -Note that neither the `high-availability-config.resources.gardener.cloud/failure-tolerance-type` nor the `high-availability-config.resources.gardener.cloud/zone-pinning` annotations are set, hence the node affinity would never be touched by the webhook. +Note that neither the `high-availability-config.resources.gardener.cloud/failure-tolerance-type`, nor the `high-availability-config.resources.gardener.cloud/zone-pinning` annotations are set, hence the node affinity would never be touched by the webhook. diff --git a/docs/development/kubernetes-clients.md b/docs/development/kubernetes-clients.md index 032cae0eb86..db77850b47f 100644 --- a/docs/development/kubernetes-clients.md +++ b/docs/development/kubernetes-clients.md @@ -1,10 +1,10 @@ # Kubernetes Clients in Gardener This document aims at providing a general developer guideline on different aspects of using Kubernetes clients in a large-scale distributed system and project like Gardener. -The points included here are not meant to be consulted as absolute rules, but rather as general rules of thumb, that allow developers to get a better feeling about certain gotchas and caveats. +The points included here are not meant to be consulted as absolute rules, but rather as general rules of thumb that allow developers to get a better feeling about certain gotchas and caveats. It should be updated with lessons learned from maintaining the project and running Gardener in production. -**Prerequisites**: +## Prerequisites: Please familiarize yourself with the following basic Kubernetes API concepts first, if you're new to Kubernetes. A good understanding of these basics will help you better comprehend the following document. @@ -21,7 +21,7 @@ For historical reasons, you will find different kinds of Kubernetes clients in G ### Client-Go Clients [client-go](https://github.com/kubernetes/client-go) is the default/official client for talking to the Kubernetes API in Golang. -It features so called ["client sets"](https://github.com/kubernetes/client-go/blob/release-1.21/kubernetes/clientset.go#L72) for all built-in Kubernetes API groups and versions (e.g. `v1` (aka `core/v1`), `apps/v1`, etc.). +It features the so called ["client sets"](https://github.com/kubernetes/client-go/blob/release-1.21/kubernetes/clientset.go#L72) for all built-in Kubernetes API groups and versions (e.g. `v1` (aka `core/v1`), `apps/v1`, etc.). client-go clients are generated from the built-in API types using [client-gen](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/generating-clientset.md) and are composed of interfaces for every known API GroupVersionKind. A typical client-go usage looks like this: ```go @@ -38,7 +38,7 @@ _Important characteristics of client-go clients:_ - clients are specific to a given API GroupVersionKind, i.e., clients are hard-coded to corresponding API-paths (don't need to use the discovery API to map GVK to a REST endpoint path). - client's don't modify the passed in-memory object (e.g. `deployment` in the above example). Instead, they return a new in-memory object. - This means, controllers have to continue working with the new in-memory object or overwrite the shared object to not lose any state updates. + This means that controllers have to continue working with the new in-memory object or overwrite the shared object to not lose any state updates. ### Generated Client Sets for Gardener APIs @@ -60,7 +60,7 @@ updatedShoot, err := c.CoreV1beta1().Shoots("garden-my-project").Update(ctx, sho ### Controller-Runtime Clients [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) is a Kubernetes community project ([kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) subproject) for building controllers and operators for custom resources. -Therefore, it features a generic client, that follows a different approach and does not rely on generated client sets. Instead, the client can be used for managing any Kubernetes resources (built-in or custom) homogeneously. +Therefore, it features a generic client that follows a different approach and does not rely on generated client sets. Instead, the client can be used for managing any Kubernetes resources (built-in or custom) homogeneously. For example: ```go @@ -76,7 +76,7 @@ err := c.Update(ctx, deployment) err = c.Update(ctx, shoot) ``` -A brief introduction to controller-runtime and its basic constructs can be found [here](https://pkg.go.dev/sigs.k8s.io/controller-runtime). +A brief introduction to controller-runtime and its basic constructs can be found at the [official Go documentation](https://pkg.go.dev/sigs.k8s.io/controller-runtime). _Important characteristics of controller-runtime clients:_ @@ -85,11 +85,11 @@ _Important characteristics of controller-runtime clients:_ A `runtime.Scheme` is basically a registry for Golang API types, defaulting and conversion functions. Schemes are usually provided per `GroupVersion` (see [this example](https://github.com/kubernetes/api/blob/release-1.21/apps/v1/register.go) for `apps/v1`) and can be combined to one single scheme for further usage ([example](https://github.com/gardener/gardener/blob/v1.29.0/pkg/client/kubernetes/types.go#L96)). In controller-runtime clients, schemes are used only for mapping a typed API object to its `GroupVersionKind`. - It then consults a `meta.RESTMapper` (also configured during client creation) for mapping the `GroupVersionKind` to a `RESTMapping`, which contains the `GroupVersionResource` and `Scope` (namespaced or cluster-scoped). From these values, the client can unambiguously determine the REST endpoint path of the corresponding API resource. For instance: `appsv1.DeploymentList` is available at `/apis/apps/v1/deployments` or `/apis/apps/v1/namespaces//deployments` respectively. - There are different `RESTMapper` implementations, but generally they are talking to the API server's discovery API for retrieving `RESTMappings` for all API resources known to the API server (either built-in, registered via API extension or `CustomResourceDefinition`s). - - The default implementation of controller-runtime (which Gardener uses as well), is the [dynamic `RESTMapper`](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.9.0/pkg/client/apiutil/dynamicrestmapper.go#L77). It caches discovery results (i.e. `RESTMappings`) in-memory and only re-discovers resources from the API server, when a client tries to use an unknown `GroupVersionKind`, i.e., when it encounters a `No{Kind,Resource}MatchError`. + - The default implementation of a controller-runtime (which Gardener uses as well) is the [dynamic `RESTMapper`](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.9.0/pkg/client/apiutil/dynamicrestmapper.go#L77). It caches discovery results (i.e. `RESTMappings`) in-memory and only re-discovers resources from the API server when a client tries to use an unknown `GroupVersionKind`, i.e., when it encounters a `No{Kind,Resource}MatchError`. - The client writes back results from the API server into the passed in-memory object. - - This means, that controllers don't have to worry about copying back the results and should just continue to work on the given in-memory object. - - This is a nice and flexible pattern and helper functions should try to follow it wherever applicable. Meaning, if possible accept an object param, pass it down to clients and keep working on the same in-memory object instead of creating a new one in your helper function. - - The benefit is, that you don't lose updates to the API object and always have the last-known state in memory. Therefore, you don't have to read it again, e.g., for getting the current `resourceVersion` when working with [optimistic locking](#conflicts-concurrency-control-and-optimistic-locking), and thus minimize the chances for running into conflicts. + - This means that controllers don't have to worry about copying back the results and should just continue to work on the given in-memory object. + - This is a nice and flexible pattern, and helper functions should try to follow it wherever applicable. Meaning, if possible accept an object param, pass it down to clients and keep working on the same in-memory object instead of creating a new one in your helper function. + - The benefit is that you don't lose updates to the API object and always have the last-known state in memory. Therefore, you don't have to read it again, e.g., for getting the current `resourceVersion` when working with [optimistic locking](#conflicts-concurrency-control-and-optimistic-locking), and thus minimize the chances for running into conflicts. - However, controllers *must not* use the same in-memory object concurrently in multiple goroutines. For example, decoding results from the API server in multiple goroutines into the same maps (e.g., labels, annotations) will cause panics because of "concurrent map writes". Also, reading from an in-memory API object in one goroutine while decoding into it in another goroutine will yield non-atomic reads, meaning data might be corrupt and represent a non-valid/non-existing API object. - Therefore, if you need to use the same in-memory object in multiple goroutines concurrently (e.g., shared state), remember to leverage proper synchronization techniques like channels, mutexes, `atomic.Value` and/or copy the object prior to use. The average controller however, will not need to share in-memory API objects between goroutines, and it's typically an indicator that the controller's design should be improved. - The client decoder erases the object's `TypeMeta` (`apiVersion` and `kind` fields) after retrieval from the API server, see [kubernetes/kubernetes#80609](https://github.com/kubernetes/kubernetes/issues/80609), [kubernetes-sigs/controller-runtime#1517](https://github.com/kubernetes-sigs/controller-runtime/issues/1517). @@ -106,9 +106,9 @@ _Important characteristics of controller-runtime clients:_ Additionally, controller-runtime clients can be used to easily retrieve metadata-only objects or lists. This is useful for efficiently checking if at least one object of a given kind exists, or retrieving metadata of an object, if one is not interested in the rest (e.g., spec/status). The `Accept` header sent to the API server then contains `application/json;as=PartialObjectMetadataList;g=meta.k8s.io;v=v1`, which makes the API server only return metadata of the retrieved object(s). -This saves network traffic and cpu/memory load on the API server and client side. +This saves network traffic and CPU/memory load on the API server and client side. If the client fully lists all objects of a given kind including their spec/status, the resulting list can be quite large and easily exceed the controllers available memory. -That's why it's important to carefully check, if a full list is actually needed or if metadata-only list can be used instead. +That's why it's important to carefully check if a full list is actually needed, or if metadata-only list can be used instead. For example: @@ -172,7 +172,7 @@ Informers are used in and created via several higher-level constructs: ### SharedInformerFactories, Listers The generated clients (built-in as well as extended) feature a `SharedInformerFactory` for every API group, which can be used to create and retrieve `Informers` for all GroupVersionKinds. -Similarly, it can be used to retrieve `Listers`, that allow getting and listing objects from the `Informer`'s cache. +Similarly, it can be used to retrieve `Listers` that allow getting and listing objects from the `Informer`'s cache. However, both of these constructs are only used for historical reasons, and we are in the process of migrating away from them in favor of cached controller-runtime clients (see [gardener/gardener#2414](https://github.com/gardener/gardener/issues/2414), [gardener/gardener#2822](https://github.com/gardener/gardener/issues/2822)). Thus, they are described only briefly here. _Important characteristics of Listers:_ @@ -186,16 +186,16 @@ _Important characteristics of Listers:_ controller-runtime features a cache implementation that can be used equivalently as their clients. In fact, it implements a subset of the `client.Client` interface containing the `Get` and `List` functions. Under the hood, a `cache.Cache` dynamically creates `Informers` (i.e., opens watches) for every object GroupVersionKind that is being retrieved from it. -Note, that the underlying Informers of a controller-runtime cache (`cache.Cache`) and the ones of a `SharedInformerFactory` (client-go) are not related in any way. +Note that the underlying Informers of a controller-runtime cache (`cache.Cache`) and the ones of a `SharedInformerFactory` (client-go) are not related in any way. Both create `Informers` and watch objects on the API server individually. -This means, that if you read the same object from different cache implementations, you may receive different versions of the object because the watch connections of the individual Informers are not synced. +This means that if you read the same object from different cache implementations, you may receive different versions of the object because the watch connections of the individual Informers are not synced. > ⚠️ Because of this, controllers/reconcilers should get the object from the same cache in the reconcile loop, where the `EventHandler` was also added to set up the controller. For example, if a `SharedInformerFactory` is used for setting up the controller then read the object in the reconciler from the `Lister` instead of from a cached controller-runtime client. -By default, the `client.Client` created by a controller-runtime `Manager` is a `DelegatingClient`. It delegates `Get` and `List` calls to a `Cache` and all other calls to a client, that talks directly to the API server. Exceptions are requests with `*unstructured.Unstructured` objects and object kinds that were configured to be excluded from the cache in the `DelegatingClient`. +By default, the `client.Client` created by a controller-runtime `Manager` is a `DelegatingClient`. It delegates `Get` and `List` calls to a `Cache`, and all other calls to a client that talks directly to the API server. Exceptions are requests with `*unstructured.Unstructured` objects and object kinds that were configured to be excluded from the cache in the `DelegatingClient`. > ℹ️ -> `kubernetes.Interface.Client()` returns a `DelegatingClient` that uses the cache returned from `kubernetes.Interface.Cache()` under the hood. This means, all `Client()` usages need to be ready for cached clients and should be able to cater with stale cache reads. +> `kubernetes.Interface.Client()` returns a `DelegatingClient` that uses the cache returned from `kubernetes.Interface.Cache()` under the hood. This means that all `Client()` usages need to be ready for cached clients and should be able to cater with stale cache reads. _Important characteristics of cached controller-runtime clients:_ @@ -229,20 +229,20 @@ Here are some general guidelines on choosing whether to read from a cache or not - Track the actions you took, e.g., when creating objects with `generateName` (this is what the `ReplicaSet` controller does [3]). The actions can be tracked in memory and repeated if the expected watch events don't occur after a given amount of time. - Always try to write controllers with the assumption that data will only be eventually correct and can be slightly out of date (even if read directly from the API server!). - If there is already some other code that needs a cache (e.g., a controller watch), reuse it instead of doing extra direct reads. - - Don't read an object again if you just sent a write request. Write requests (`Create`, `Update`, `Patch` and `Delete`) don't interact with the cache. Hence, use the current state that the API server returned (filled into the passed in-memory object), which is basically a "free direct read", instead of reading the object again from a cache, because this will probably set back the object to an older `resourceVersion`. + - Don't read an object again if you just sent a write request. Write requests (`Create`, `Update`, `Patch` and `Delete`) don't interact with the cache. Hence, use the current state that the API server returned (filled into the passed in-memory object), which is basically a "free direct read" instead of reading the object again from a cache, because this will probably set back the object to an older `resourceVersion`. - If you are concerned about the impact of the resulting cache, try to minimize that by using filtered or metadata-only watches. - If watching and caching an object type is not feasible, for example because there will be a lot of updates, and you are only interested in the object every ~5m, or because it will blow up the controllers memory footprint, fallback to a direct read. This can either be done by disabling caching the object type generally or doing a single request via an `APIReader`. In any case, please bear in mind that every direct API call results in a [quorum read from etcd](https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter), which can be costly in a heavily-utilized cluster and impose significant scalability limits. Thus, always try to minimize the impact of direct calls by filtering results by namespace or labels, limiting the number of results and/or using metadata-only calls. [2] The `Deployment` controller uses the pattern `-` for naming `ReplicaSets`. This means, the name of a `ReplicaSet` it tries to create/update/delete at any given time is deterministically calculated based on the `Deployment` object. By this, it is insusceptible to stale reads from its `ReplicaSets` cache. -[3] In simple terms, the `ReplicaSet` controller tracks its `CREATE pod` actions as follows: when creating new `Pods`, it increases a counter of expected `ADDED` watch events for the corresponding `ReplicaSet`. As soon as such events arrive, it decreases the counter accordingly. It only creates new `Pods` for a given `ReplicaSet`, once all expected events occurred (counter is back to zero) or a timeout occurred. This way, it prevents creating more `Pods` than desired because of stale cache reads and makes the controller eventually consistent. +[3] In simple terms, the `ReplicaSet` controller tracks its `CREATE pod` actions as follows: when creating new `Pods`, it increases a counter of expected `ADDED` watch events for the corresponding `ReplicaSet`. As soon as such events arrive, it decreases the counter accordingly. It only creates new `Pods` for a given `ReplicaSet` once all expected events occurred (counter is back to zero) or a timeout has occurred. This way, it prevents creating more `Pods` than desired because of stale cache reads and makes the controller eventually consistent. -## Conflicts, Concurrency Control and Optimistic Locking +## Conflicts, Concurrency Control, and Optimistic Locking Every Kubernetes API object contains the `metadata.resourceVersion` field, which identifies an object's version in the backing data store, i.e., etcd. Every write to an object in etcd results in a newer `resourceVersion`. This field is mainly used for concurrency control on the API server in an optimistic locking fashion, but also for efficient resumption of interrupted watch connections. -Optimistic locking in the Kubernetes API sense means that when a client wants to update an API object then it includes the object's `resourceVersion` in the request to indicate the object's version the modifications are based on. +Optimistic locking in the Kubernetes API sense means that when a client wants to update an API object, then it includes the object's `resourceVersion` in the request to indicate the object's version the modifications are based on. If the `resourceVersion` in etcd has not changed in the meantime, the update request is accepted by the API server and the updated object is written to etcd. If the `resourceVersion` sent by the client does not match the one of the object stored in etcd, there were concurrent modifications to the object. Consequently, the request is rejected with a conflict error (status code `409`, API reason `Conflict`), for example: @@ -262,41 +262,41 @@ If the `resourceVersion` sent by the client does not match the one of the object } ``` -This concurrency control is an important mechanism in Kubernetes as there are typically multiple clients acting on API objects at the same time (humans, different controllers, etc.). If a client receives a conflict error, it should read the object's latest version from the API server, make the modifications based on the newest changes and retry the update. +This concurrency control is an important mechanism in Kubernetes as there are typically multiple clients acting on API objects at the same time (humans, different controllers, etc.). If a client receives a conflict error, it should read the object's latest version from the API server, make the modifications based on the newest changes, and retry the update. The reasoning behind this is that a client might choose to make different decisions based on the concurrent changes made by other actors compared to the outdated version that it operated on. _Important points about concurrency control and conflicts:_ - The `resourceVersion` field carries a string value and clients must not assume numeric values (the type and structure of versions depend on the backing data store). This means clients may compare `resourceVersion` values to detect whether objects were changed. But they must not compare `resourceVersion`s to figure out which one is newer/older, i.e., no greater/less-than comparisons are allowed. -- By default, update calls (e.g. via client-go and controller-runtime clients) use optimistic locking as the passed in-memory usually object contains the latest `resourceVersion` known to the controller which is then also sent to the API server. -- API servers can also choose to accept update calls without optimistic locking (i.e., without a `resourceVersion` in the object's metadata) for any given resource. However, sending update requests without optimistic locking is strongly discouraged as doing so overwrites the entire object discarding any concurrent changes made to it. +- By default, update calls (e.g. via client-go and controller-runtime clients) use optimistic locking as the passed in-memory usually object contains the latest `resourceVersion` known to the controller, which is then also sent to the API server. +- API servers can also choose to accept update calls without optimistic locking (i.e., without a `resourceVersion` in the object's metadata) for any given resource. However, sending update requests without optimistic locking is strongly discouraged, as doing so overwrites the entire object, discarding any concurrent changes made to it. - On the other side, patch requests can always be executed either with or without optimistic locking, by (not) including the `resourceVersion` in the patched object's metadata. Sending patch requests without optimistic locking might be safe and even desirable as a patch typically updates only a specific section of the object. However, there are also situations where patching without optimistic locking is not safe (see below). ### Don’t Retry on Conflict Similar to how a human would typically handle a conflict error, there are helper functions implementing `RetryOnConflict`-semantics, i.e., try an update call, then re-read the object if a conflict occurs, apply the modification again and retry the update. However, controllers should generally *not* use `RetryOnConflict`-semantics. Instead, controllers should abort their current reconciliation run and let the queue handle the conflict error with exponential backoff. -The reasoning behind this is, that a conflict error indicates that the controller has operated on stale data and might have made wrong decisions earlier on in the reconciliation. +The reasoning behind this is that a conflict error indicates that the controller has operated on stale data and might have made wrong decisions earlier on in the reconciliation. When using a helper function that implements `RetryOnConflict`-semantics, the controller doesn't check which fields were changed and doesn't revise its previous decisions accordingly. Instead, retrying on conflict basically just ignores any conflict error and blindly applies the modification. To properly solve the conflict situation, controllers should immediately return with the error from the update call. This will cause retries with exponential backoff so that the cache has a chance to observe the latest changes to the object. -In a later run, the controller will then make correct decisions based on the newest version of the object, not run into conflict errors and will then be able to successfully reconcile the object. This way, the controller becomes eventually consistent. +In a later run, the controller will then make correct decisions based on the newest version of the object, not run into conflict errors, and will then be able to successfully reconcile the object. This way, the controller becomes eventually consistent. The other way to solve the situation is to modify objects without optimistic locking in order to avoid running into a conflict in the first place (only if this is safe). This can be a preferable solution for controllers with long-running reconciliations (which is actually an anti-pattern but quite unavoidable in some of Gardener's controllers). -Aborting the entire reconciliation run is rather undesirable in such cases as it will add a lot of unnecessary waiting time for end users and overhead in terms of compute and network usage. +Aborting the entire reconciliation run is rather undesirable in such cases, as it will add a lot of unnecessary waiting time for end users and overhead in terms of compute and network usage. -However, in any case retrying on conflict is probably not the right option to solve the situation (there are some correct use cases for it, though, they are very rare). Hence, don't retry on conflict. +However, in any case, retrying on conflict is probably not the right option to solve the situation (there are some correct use cases for it, though, they are very rare). Hence, don't retry on conflict. ### To Lock or Not to Lock -As explained before, conflicts are actually important and prevent clients from doing wrongful concurrent updates. This means, conflicts are not something we generally want to avoid or ignore. +As explained before, conflicts are actually important and prevent clients from doing wrongful concurrent updates. This means that conflicts are not something we generally want to avoid or ignore. However, in many cases controllers are exclusive owners of the fields they want to update and thus it might be safe to run without optimistic locking. For example, the gardenlet is the exclusive owner of the `spec` section of the Extension resources it creates on behalf of a Shoot (e.g., the `Infrastructure` resource for creating VPC, etc.). Meaning, it knows the exact desired state and no other actor is supposed to update the Infrastructure's `spec` fields. When the gardenlet now updates the Infrastructures `spec` section as part of the Shoot reconciliation, it can simply issue a `PATCH` request that only updates the `spec` and runs without optimistic locking. -If another controller concurrently updated the object in the meantime (e.g., the `status` section), the `resourceVersion` got changed which would cause a conflict error if running with optimistic locking. +If another controller concurrently updated the object in the meantime (e.g., the `status` section), the `resourceVersion` got changed, which would cause a conflict error if running with optimistic locking. However, concurrent `status` updates would not change the gardenlet's mind on the desired `spec` of the Infrastructure resource as it is determined only by looking at the Shoot's specification. If the `spec` section was changed concurrently, it's still fine to overwrite it because the gardenlet should reconcile the `spec` back to its desired state. @@ -307,7 +307,7 @@ Obviously, this applies only to patch requests that modify only a specific set o In such cases, it's even desirable to run without optimistic locking as it will be more performant and save retries. If certain requests are made with high frequency and have a good chance of causing conflicts, retries because of optimistic locking can cause a lot of additional network traffic in a large-scale Gardener installation. -## Updates, Patches, Server-side Apply +## Updates, Patches, Server-Side Apply There are different ways of modifying Kubernetes API objects. The following snippet demonstrates how to do a given modification with the most frequently used options using a controller-runtime client: @@ -347,16 +347,16 @@ _Important characteristics of the shown request types:_ patch = client.StrategicMergeFrom(shoot.DeepCopy(), client.MergeFromWithOptimisticLock{}) // ... ``` -- Patch requests only contain the changes made to the in-memory object between the copy passed to `client.*MergeFrom` and the object passed to `Client.Patch()`. The diff is calculated on the client-side based on the in-memory objects only. This means, if in the meantime some fields were changed on the API server to a different value than the one on the client-side, the fields will not be changed back as long as they are not changed on the client-side as well (there will be no diff in memory). -- Thus, if you want to ensure a given state using patch requests, always read the object first before patching it, as there will be no diff otherwise, meaning the patch will be empty. Also see [gardener/gardener#4057](https://github.com/gardener/gardener/pull/4057) and comments in [gardener/gardener#4027](https://github.com/gardener/gardener/pull/4027). +- Patch requests only contain the changes made to the in-memory object between the copy passed to `client.*MergeFrom` and the object passed to `Client.Patch()`. The diff is calculated on the client-side based on the in-memory objects only. This means that if in the meantime some fields were changed on the API server to a different value than the one on the client-side, the fields will not be changed back as long as they are not changed on the client-side as well (there will be no diff in memory). +- Thus, if you want to ensure a given state using patch requests, always read the object first before patching it, as there will be no diff otherwise, meaning the patch will be empty. For more information, see [gardener/gardener#4057](https://github.com/gardener/gardener/pull/4057) and the comments in [gardener/gardener#4027](https://github.com/gardener/gardener/pull/4027). - Also, always send updates and patch requests even if your controller hasn't made any changes to the current state on the API server. I.e., don't make any optimization for preventing empty patches or no-op updates. There might be mutating webhooks in the system that will modify the object and that rely on update/patch requests being sent (even if they are no-op). Gardener's extension concept makes heavy use of mutating webhooks, so it's important to keep this in mind. - JSON merge patches always replace lists as a whole and don't merge them. Keep this in mind when operating on lists with merge patch requests. If the controller is the exclusive owner of the entire list, it's safe to run without optimistic locking. Though, if you want to prevent overwriting concurrent changes to the list or its items made by other actors (e.g., additions/removals to the `metadata.finalizers` list), enable optimistic locking. -- Strategic merge patches are able to make more granular modifications to lists and their elements without replacing the entire list. It uses Golang struct tags of the API types to determine which and how lists should be merged. See [this document](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/) or the [strategic merge patch documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md) for more in-depth explanations and comparison with JSON merge patches. +- Strategic merge patches are able to make more granular modifications to lists and their elements without replacing the entire list. It uses Golang struct tags of the API types to determine which and how lists should be merged. See [Update API Objects in Place Using kubectl patch](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/) or the [strategic merge patch documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md) for more in-depth explanations and comparison with JSON merge patches. With this, controllers *might* be able to issue patch requests for individual list items without optimistic locking, even if they are not exclusive owners of the entire list. Remember to check the `patchStrategy` and `patchMergeKey` struct tags of the fields you want to modify before blindly adding patch requests without optimistic locking. - Strategic merge patches are only supported by built-in Kubernetes resources and custom resources served by Extension API servers. Strategic merge patches are not supported by custom resources defined by `CustomResourceDefinition`s (see [this comparison](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#advanced-features-and-flexibility)). In that case, fallback to JSON merge patches. - [Server-side Apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) is yet another mechanism to modify API objects, which is supported by all API resources (in newer Kubernetes versions). However, it has a few problems and more caveats preventing us from using it in Gardener at the time of writing. See [gardener/gardener#4122](https://github.com/gardener/gardener/issues/4122) for more details. -> Generally speaking, patches are often the better option compared to update requests because they can save network traffic, encoding/decoding effort and avoid conflicts under the presented conditions. +> Generally speaking, patches are often the better option compared to update requests because they can save network traffic, encoding/decoding effort, and avoid conflicts under the presented conditions. > If choosing a patch type, consider which type is supported by the resource you're modifying and what will happen in case of a conflict. Consider whether your modification is safe to run without optimistic locking. > However, there is no simple rule of thumb on which patch type to choose. @@ -369,7 +369,7 @@ That's why usage of this function was completely replaced in [gardener/gardener# `controllerutil.CreateOrPatch` is similar to `CreateOrUpdate` but does a patch request instead of an update request. It has the same drawback as `CreateOrUpdate` regarding no-op updates. Also, controllers can't use optimistic locking or strategic merge patches when using `CreateOrPatch`. -Another reason for avoiding use of this function is, that it also implicitly patches the status section if it was changed, which is confusing for others reading the code. To accomplish this, the func does some back and forth conversion, comparison and checks, which are unnecessary in most of our cases and simply wasted CPU cycles and complexity we want to avoid. +Another reason for avoiding use of this function is that it also implicitly patches the status section if it was changed, which is confusing for others reading the code. To accomplish this, the func does some back and forth conversion, comparison and checks, which are unnecessary in most of our cases and simply wasted CPU cycles and complexity we want to avoid. There were some `Try{Update,UpdateStatus,Patch,PatchStatus}` helper functions in Gardener that were already removed by [gardener/gardener#4378](https://github.com/gardener/gardener/pull/4378) but are still used in some extension code at the time of writing. The reason for eliminating these functions is that they implement `RetryOnConflict`-semantics. Meaning, they first get the object, mutate it, then try to update and retry if a conflict error occurs. @@ -380,7 +380,7 @@ For the reasons explained above, there are similar helper functions that accompl These can be safely used as replacements for the aforementioned helper funcs. If they are not fitting for your use case, for example because you need to use optimistic locking, just do the appropriate calls in the controller directly. -## Further Resources +## Related Links - [Kubernetes Client usage in Gardener](https://www.youtube.com/watch?v=RPsUo925PUA&t=40s) (Community Meeting talk, 2020-06-26) diff --git a/docs/development/local_setup.md b/docs/development/local_setup.md index f661a0f5b46..a183d3cf25e 100644 --- a/docs/development/local_setup.md +++ b/docs/development/local_setup.md @@ -16,7 +16,7 @@ This guide is split into three main parts: * [Building and starting Gardener components locally](#start-gardener-locally) * [Using your local Gardener setup to create a Shoot](#create-a-shoot) -## Limitations of the local development setup +## Limitations of the Local Development Setup You can run Gardener (API server, controller manager, scheduler, gardenlet) against any local Kubernetes cluster, however, your seed and shoot clusters must be deployed to a cloud provider. Currently, it is not possible to run Gardener entirely isolated from any cloud provider. This means that to be able create Shoot clusters you need to register an external Seed cluster (e.g., one created in AWS). @@ -102,7 +102,7 @@ brew install jq brew install parallel ``` -## [macOS only] Install GNU core utilities +## [macOS only] Install GNU Core Utilities When running on macOS, install the GNU core utilities and friends: @@ -119,11 +119,11 @@ export PATH=/usr/local/opt/gnu-tar/libexec/gnubin:$PATH export PATH=/usr/local/opt/grep/libexec/gnubin:$PATH ``` -## [Windows only] WSL2 +## [Windows Only] WSL2 Apart from Linux distributions and macOS, the local gardener setup can also run on the Windows Subsystem for Linux 2. -While WSL1, plain docker for windows and various Linux distributions and local Kubernetes environments may be supported, this setup was verified with: +While WSL1, plain docker for Windows and various Linux distributions and local Kubernetes environments may be supported, this setup was verified with: * [WSL2](https://docs.microsoft.com/en-us/windows/wsl/wsl2-index) * [Docker Desktop WSL2 Engine](https://docs.docker.com/docker-for-windows/wsl/) * [Ubuntu 18.04 LTS on WSL2](https://ubuntu.com/blog/ubuntu-on-wsl-2-is-generally-available) @@ -131,9 +131,9 @@ While WSL1, plain docker for windows and various Linux distributions and local K The Gardener repository and all the above-mentioned tools (git, golang, kubectl, ...) should be installed in your WSL2 distro, according to the distribution-specific Linux installation instructions. -# Start Gardener locally +# Start Gardener Locally -## Get the sources +## Get the Sources Clone the repository from GitHub into your `$GOPATH`. @@ -150,9 +150,9 @@ cd gardener ℹ️ In the following guide, you have to define the configuration (`CloudProfile`s, `SecretBinding`s, `Seed`s, etc.) manually for the infrastructure environment you want to develop against. Additionally, you have to register the respective Gardener extensions manually. -If you are rather looking for a quick start guide to develop entirely locally on your machine (no real cloud provider or infrastructure involved) then you should rather follow [this guide](getting_started_locally.md). +If you are rather looking for a quick start guide to develop entirely locally on your machine (no real cloud provider or infrastructure involved), then you should rather follow [this guide](getting_started_locally.md). -### Start a local kubernetes cluster +### Start a Local Kubernetes Cluster For the development of Gardener you need a Kubernetes API server on which you can register Gardener's own Extension API Server as `APIService`. This cluster doesn't need any worker nodes to run pods, though, therefore, you can use the "nodeless Garden cluster setup" residing in `hack/local-garden`. This will start all minimally required components of a Kubernetes cluster (`etcd`, `kube-apiserver`, `kube-controller-manager`) and an `etcd` Instance for the `gardener-apiserver` as Docker containers. This is the easiest way to get your @@ -182,9 +182,9 @@ make local-garden-down ```
- Alternative: Using a local kubernetes cluster + Alternative: Using a local Kubernetes cluster - Instead of starting a kubernetes API server and etcd as docker containers, you can also opt for running a local kubernetes cluster, provided by e.g. [minikube](https://minikube.sigs.k8s.io/docs/start/), [kind](https://kind.sigs.k8s.io/docs/user/quick-start/) or docker desktop. + Instead of starting a Kubernetes API server and etcd as docker containers, you can also opt for running a local Kubernetes cluster, provided by e.g. [minikube](https://minikube.sigs.k8s.io/docs/start/), [kind](https://kind.sigs.k8s.io/docs/user/quick-start/) or docker desktop. > Note: Gardener requires self-contained kubeconfig files because of a [security issue](https://banzaicloud.com/blog/kubeconfig-security/). You can configure your minikube to create self-contained kubeconfig files via: > ```bash @@ -198,7 +198,7 @@ make local-garden-down
- Alternative: Using a remote kubernetes cluster + Alternative: Using a remote Kubernetes cluster For some testing scenarios, you may want to use a remote cluster instead of a local one as your Garden cluster. To do this, you can use the "remote Garden cluster setup" residing in `hack/remote-garden`. This will start an `etcd` instance for the `gardener-apiserver` as a Docker container, and open tunnels for accessing local gardener components from the remote cluster. @@ -225,7 +225,7 @@ To close the tunnels and remove the locally-running Docker containers, run: make remote-garden-down ``` -ℹ️ [Optional] If you want to use the remote Garden cluster setup with the `SeedAuthorization` feature you have to adapt the `kube-apiserver` process of your remote Garden cluster. To do this, perform the following steps after running `make remote-garden-up`: +ℹ️ [Optional] If you want to use the remote Garden cluster setup with the `SeedAuthorization` feature, you have to adapt the `kube-apiserver` process of your remote Garden cluster. To do this, perform the following steps after running `make remote-garden-up`: * Create an [authorization webhook configuration file](https://kubernetes.io/docs/reference/access-authn-authz/webhook/#configuration-file-format) using the IP of the `garden/quic-server` pod running in your remote Garden cluster and port 10444 that tunnels to your locally running `gardener-admission-controller` process. @@ -298,7 +298,7 @@ apiservice.apiregistration.k8s.io/v1alpha1.seedmanagement.gardener.cloud created apiservice.apiregistration.k8s.io/v1alpha1.settings.gardener.cloud created ``` -ℹ️ [Optional] If you want to enable logging, in the Gardenlet configuration add: +ℹ️ [Optional] If you want to enable logging, in the gardenlet configuration add: ```yaml logging: enabled: true @@ -307,7 +307,7 @@ logging: The Gardener exposes the API servers of Shoot clusters via Kubernetes services of type `LoadBalancer`. In order to establish stable endpoints (robust against changes of the load balancer address), it creates DNS records pointing to these load balancer addresses. They are used internally and by all cluster components to communicate. You need to have control over a domain (or subdomain) for which these records will be created. -Please provide an *internal domain secret* (see [this](../../example/10-secret-internal-domain.yaml) for an example) which contains credentials with the proper privileges. Further information can be found [here](../usage/configuration.md). +Please provide an *internal domain secret* (see [this](../../example/10-secret-internal-domain.yaml) for an example) which contains credentials with the proper privileges. Further information can be found in [Gardener Configuration and Usage](../usage/configuration.md). ```bash kubectl apply -f example/10-secret-internal-domain-unmanaged.yaml @@ -316,7 +316,7 @@ secret/internal-domain-unmanaged created ### Run the Gardener -Next, run the Gardener API Server, the Gardener Controller Manager (optionally), the Gardener Scheduler (optionally), and the Gardenlet in different terminal windows/panes using rules in the `Makefile`. +Next, run the Gardener API Server, the Gardener Controller Manager (optionally), the Gardener Scheduler (optionally), and the gardenlet in different terminal windows/panes using rules in the `Makefile`. ```bash make start-apiserver @@ -375,7 +375,7 @@ to operate against your local running Gardener API Server. The steps below describe the general process of creating a Shoot. Have in mind that the steps do not provide full example manifests. The reader needs to check the provider documentation and adapt the manifests accordingly. -#### 1. Copy the example manifests +#### 1. Copy the Example Manifests The next steps require modifications of the example manifests. These modifications are part of local setup and should not be `git push`-ed. To do not interfere with git, let's copy the example manifests to `dev/` which is ignored by git. @@ -408,11 +408,11 @@ The `CloudProfile` resource is provider specific and describes the underlying cl kubectl apply -f dev/30-cloudprofile.yaml ``` -#### 4. Install necessary Gardener Extensions +#### 4. Install Necessary Gardener Extensions -The [Known Extension Implementations](../../extensions/README.md#known-extension-implementations) section contains a list of available extension implementations. You need to create a ControllerRegistration and ControllerDeployment for +The [Known Extension Implementations](../../extensions/README.md#known-extension-implementations) section contains a list of available extension implementations. You need to create a ControllerRegistration and ControllerDeployment for: * at least one infrastructure provider -* a dns provider (if the DNS for the Seed is not disabled) +* a DNS provider (if the DNS for the Seed is not disabled) * at least one operating system extension * at least one network plugin extension @@ -423,13 +423,13 @@ kubectl apply -f https://raw.githubusercontent.com/gardener/gardener-extension-p ``` Instead of updating extensions manually you can use [Gardener Extensions Manager](https://github.com/gardener/gem) to install and update extension controllers. This is especially useful if you want to keep and maintain your development setup for a longer time. -Also, please refer to [this document](../extensions/controllerregistration.md) for further information about how extensions are registered in case you want to use other versions than the latest releases. +Also, please refer to [Registering Extension Controllers](../extensions/controllerregistration.md) for further information about how extensions are registered in case you want to use other versions than the latest releases. #### 5. Register a Seed Shoot controlplanes run in seed clusters, so we need to create our first Seed now. -Check the corresponding example manifest `dev/40-secret-seed.yaml` and `dev/50-seed.yaml`. Update `dev/40-secret-seed.yaml` with base64 encoded kubeconfig of the cluster that will be used as Seed (the scope of the permissions should be identical to the kubeconfig that the Gardenlet creates during bootstrapping - for now, `cluster-admin` privileges are recommended). +Check the corresponding example manifest `dev/40-secret-seed.yaml` and `dev/50-seed.yaml`. Update `dev/40-secret-seed.yaml` with base64 encoded kubeconfig of the cluster that will be used as Seed (the scope of the permissions should be identical to the kubeconfig that the gardenlet creates during bootstrapping - for now, `cluster-admin` privileges are recommended). ```bash kubectl apply -f dev/40-secret-seed.yaml @@ -441,9 +441,9 @@ Adapt `dev/50-seed.yaml` - adjust `.spec.secretRef` to refer the newly created S kubectl apply -f dev/50-seed.yaml ``` -#### 6. Start Gardenlet +#### 6. Start the gardenlet -Once the Seed is created, start the Gardenlet to reconcile it. The `make start-gardenlet` command will automatically configure the local Gardenlet process to use the Seed and its kubeconfig. If you have multiple Seeds, you have to specify which to use by setting the `SEED_NAME` environment variable like in `make start-gardenlet SEED_NAME=my-first-seed`. +Once the Seed is created, start the gardenlet to reconcile it. The `make start-gardenlet` command will automatically configure the local gardenlet process to use the Seed and its kubeconfig. If you have multiple Seeds, you have to specify which to use by setting the `SEED_NAME` environment variable like in `make start-gardenlet SEED_NAME=my-first-seed`. ```bash make start-gardenlet @@ -458,7 +458,7 @@ time="2019-11-06T15:24:18+02:00" level=info msg="Seed controller initialized." [...] ``` -The Gardenlet will now reconcile the Seed. Check the progess from time to time until it's `Ready`: +The gardenlet will now reconcile the Seed. Check the progess from time to time until it's `Ready`: ```bash kubectl get seed @@ -468,14 +468,14 @@ seed-aws Ready aws eu-west-1 4m v1.61.0-dev v1.24.8 #### 7. Create a Shoot -A Shoot requires a SecretBinding. The SecretBinding refers to a Secret that contains the cloud provider credentials. The Secret data keys are provider specific and you need to check the documentation of the provider to find out which data keys are expected (for example for AWS the related documentation can be found [here](https://github.com/gardener/gardener-extension-provider-aws/blob/master/docs/usage-as-end-user.md#provider-secret-data)). Adapt `dev/70-secret-provider.yaml` and `dev/80-secretbinding.yaml` and apply them. +A Shoot requires a SecretBinding. The SecretBinding refers to a Secret that contains the cloud provider credentials. The Secret data keys are provider specific and you need to check the documentation of the provider to find out which data keys are expected (for example for AWS the related documentation can be found at [Provider Secret Data](https://github.com/gardener/gardener-extension-provider-aws/blob/master/docs/usage-as-end-user.md#provider-secret-data)). Adapt `dev/70-secret-provider.yaml` and `dev/80-secretbinding.yaml` and apply them. ```bash kubectl apply -f dev/70-secret-provider.yaml kubectl apply -f dev/80-secretbinding.yaml ``` -After the SecretBinding creation, you are ready to proceed with the Shoot creation. You need to check the documentation of the provider to find out the expected configuration (for example for AWS the related documentation and example Shoot manifest can be found [here](https://github.com/gardener/gardener-extension-provider-aws/blob/master/docs/usage-as-end-user.md)). Adapt `dev/90-shoot.yaml` and apply it. +After the SecretBinding creation, you are ready to proceed with the Shoot creation. You need to check the documentation of the provider to find out the expected configuration (for example for AWS the related documentation and example Shoot manifest can be found at [Using the AWS provider extension with Gardener as end-user](https://github.com/gardener/gardener-extension-provider-aws/blob/master/docs/usage-as-end-user.md)). Adapt `dev/90-shoot.yaml` and apply it. To make sure that a specific Seed cluster will be chosen or to skip the scheduling (the sheduling requires Gardener Scheduler to be running), specify the `.spec.seedName` field (see [here](../../example/90-shoot.yaml#L317-L318)). diff --git a/docs/development/log_parsers.md b/docs/development/log_parsers.md index 193d5b75668..7cec64353d9 100644 --- a/docs/development/log_parsers.md +++ b/docs/development/log_parsers.md @@ -1,4 +1,4 @@ -# How to create log parser for container into fluent-bit +# How to Create Log Parser for Container into fluent-bit If our log message is parsed correctly, it has to be showed in Grafana like this: @@ -17,27 +17,26 @@ Otherwise it will looks like this: } ``` -## Lets make a custom parser now +## Create a Custom Parser -- First of all we need to know how does the log for the specific container look like (for example lets take a log from the `alertmanager` : +- First of all, we need to know how the log for the specific container looks like (for example, lets take a log from the `alertmanager` : `level=info ts=2019-01-28T12:33:49.362015626Z caller=main.go:175 build_context="(go=go1.11.2, user=root@4ecc17c53d26, date=20181109-15:40:48)`) - We can see that this log contains 4 subfields(severity=info, timestamp=2019-01-28T12:33:49.362015626Z, source=main.go:175 and the actual message). -So we have to write a regex which matches this log in 4 groups(We can use https://regex101.com/ like helping tool). So for this purpose our regex -looks like this: +So we have to write a regex which matches this log in 4 groups(We can use https://regex101.com/ like helping tool). So, for this purpose our regex looks like this: ```text ^level=(?\w+)\s+ts=(?