- Introduction
- Getting started
- Custom Resource Definitions (CRDs)
- Examples of API Usage
- Cluster API Provider Lifecycle
- Air-gapped Environment
The Cluster API Operator is a Kubernetes Operator designed to empower cluster administrators to handle the lifecycle of Cluster API providers within a management cluster using a declarative approach. It aims to improve user experience in deploying and managing Cluster API, making it easier to handle day-to-day tasks and automate workflows with GitOps.
This operator leverages a declarative API and extends the capabilities of the clusterctl
CLI, allowing greater flexibility and configuration options for cluster administrators.
- Offers a declarative API that simplifies the management of Cluster API providers and enables GitOps workflows.
- Facilitates provider upgrades and downgrades making it more convenient for distributed teams and CI pipelines.
- Aims to support air-gapped environments without direct access to GitHub/GitLab.
- Leverages controller-runtime configuration API for a more flexible Cluster API providers setup.
- Provides a transparent and effective way to interact with various Cluster API components on the management cluster.
The lexicon used in this document is described in more detail here. Any discrepancies should be rectified in the main Cluster API glossary.
- kubectl for interacting with the management cluster.
- Helm for installing operator on the cluster (optional).
Before installing the Cluster API Operator, you must first ensure that cert-manager is installed, as the operator does not manage cert-manager installations. To install cert-manager, run the following command:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml
Wait for cert-manager to be ready before proceeding.
After cert-manager is successfully installed, you can install the Cluster API operator using one of the following methods:
Install the Cluster API operator directly by applying the latest release assets:
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api-operator/releases/latest/download/operator-components.yaml
Alternatively, you can install the Cluster API operator using Helm charts:
helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system
The operator Helm chart supports a "quickstart" option for bootstrapping a management cluster. The user experience is relatively similar to clusterctl init:
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=docker:v1.4.2 --wait # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure=docker,azure --wait # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure="capd-custom-ns:docker:v1.4.2;capz-custom-ns:azure:v1.10.0" --wait # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set core=cluster-api:v1.4.2 --set controlPlane=kubeadm:v1.4.2 --set bootstrap=kubeadm:v1.4.2 --set infrastructure=docker:v1.4.2 --wait
For more complex operations, please refer to our API documentation.
The Cluster API Operator uses the controller-runtime library, making it compatible with all the options that the library provides. This offers flexibility when configuring the operator and allows you to benefit from the features offered by controller-runtime.
Some examples of controller-runtime configuration options you can use with the Cluster API Operator include:
Metrics: Controller-runtime enables you to collect and expose metrics about its internal behavior, such as the number of reconciliations executed by the operator over time. You can customize the metrics endpoint and the metrics scraping interval, among other settings.
Leader Election: To ensure high availability of the operator, you can enable leader election when running multiple replicas. Controller-runtime allows you to set the leader election resource lock and polling interval to suit your needs.
Logger: The operator allows you to use controller-runtime logging options to configure the logging subsystem. You can choose the logging level and output format, and even enable logging for specific libraries or components.
Here's an example of how you can configure the Cluster API Operator deployment with some of these options:
apiVersion: apps/v1
kind: Deployment
name: cluster-api-operator
namespace: capi-operator-system
- name: manager
- --metrics-bind-addr=:8080
- --leader-elect
- --leader-elect-retry-period=5s
- --v=5
For complete details on the available configuration options, you can execute:
docker run -it --rm registry.k8s.io/capi-operator/cluster-api-operator:${CAPI_OPERATOR_VERSION} /manager --help
In this section, we will walk you through the basic process of installing Cluster API providers using the operator. The Cluster API operator manages four types of objects:
- CoreProvider
- BootstrapProvider
- ControlPlaneProvider
- InfrastructureProvider
Please note that this example provides a basic configuration of Azure Infrastructure provider for getting started. More detailed examples and CRD descriptions will be provided in subsequent sections of this document.
The first step is to install the CoreProvider, which is responsible for managing the Cluster API CRDs and the Cluster API controller.
You can utilize any existing namespace for providers in your Kubernetes operator. However, before creating a provider object, make sure the specified namespace has been created. In the example below, we use the capi-system
namespace. You can create this namespace through either the Command Line Interface (CLI) by running kubectl create namespace capi-system
, or by using the declarative approach described in the official Kubernetes documentation.
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: CoreProvider
name: cluster-api
namespace: capi-system
version: v1.4.3
Note: Only one CoreProvider can be installed at the same time on a single cluster.
Next, install Azure Infrastructure Provider. Before that ensure that capz-system
namespace exists.
Since the provider requires variables to be set, create a secret containing them in the same namespace as the provider. It is also recommended to include a github-token
in the secret. This token is used to fetch the provider repository, and it is required for the provider to be installed. The operator may exceed the rate limit of the GitHub API without the token. Like clusterctl, the token needs only the repo
apiVersion: v1
kind: Secret
name: azure-variables
namespace: capz-system
type: Opaque
github-token: ghp_fff
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: azure
namespace: capz-system
version: v1.9.3
secretName: azure-variables
To remove the installed providers and all related kubernetes objects just delete the following CRs:
kubectl delete coreprovider cluster-api
kubectl delete infrastructureprovider azure
The Cluster API Operator introduces new API types: CoreProvider
, BootstrapProvider
, ControlPlaneProvider
, and InfrastructureProvider
. These four provider types share common Spec and Status types, ProviderSpec
and ProviderStatus
, respectively.
The CRDs are scoped to be namespaced, allowing RBAC restrictions to be enforced if needed. This scoping also enables the installation of multiple versions of controllers (grouped within namespaces) in the same management cluster.
To better understand how the API can be used, please refer to the Example API Usage section.
Related Golang structs can be found in the Cluster API Operator repository.
Below are the new API types being defined, with shared types used for Spec and Status among the different provider types—Core, Bootstrap, ControlPlane, and Infrastructure:
type CoreProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
type BootstrapProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
type ControlPlaneProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
type InfrastructureProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
The following sections provide details about ProviderSpec
and ProviderStatus
, which are shared among all the provider types: Core, Bootstrap, ControlPlane, and Infrastructure.
: desired state of the Provider, consisting of:- Version (string): provider version (e.g., "v0.1.0")
- Manager (optional ManagerSpec): controller manager properties for the provider
- Deployment (optional DeploymentSpec): deployment properties for the provider
- SecretName (optional string): name of the secret that contains provider credentials
- SecretNamespace (optional string): namespace of the secret that contains provider credentials
- FetchConfig (optional FetchConfiguration): how the operator will fetch components and metadata
YAML example:
... spec: version: "v0.1.0" manager: maxConcurrentReconciles: 5 deployment: replicas: 1 secretName: "provider-secret" fetchConfig: url: "https://github.com/owner/repo/releases" ...
: controller manager properties for the provider, consisting of:- ProfilerAddress (optional string): pprof profiler bind address (e.g., "localhost:6060")
- MaxConcurrentReconciles (optional int): maximum number of concurrent reconciles
- Verbosity (optional int): logs verbosity
- FeatureGates (optional map[string]bool): provider specific feature flags
YAML example:
... spec: manager: profilerAddress: "localhost:6060" maxConcurrentReconciles: 5 verbosity: 1 featureGates: FeatureA: true FeatureB: false ...
: deployment properties for the provider, consisting of:- Replicas (optional int): number of desired pods
- NodeSelector (optional map[string]string): node label selector
- Tolerations (optional []corev1.Toleration): pod tolerations
- Affinity (optional corev1.Affinity): pod scheduling constraints
- Containers (optional []ContainerSpec): list of deployment containers
- ServiceAccountName (optional string): pod service account
- ImagePullSecrets (optional []corev1.LocalObjectReference): list of image pull secrets specified in the Deployment
YAML example:
... spec: deployment: replicas: 2 nodeSelector: disktype: ssd tolerations: - key: "example" operator: "Exists" effect: "NoSchedule" affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "example" operator: "In" values: - "true" containers: - name: "containerA" image: repository: "example.com/repo" name: "image-name" tag: "v1.0.0" args: exampleArg: "value" ...
: container properties for the provider, consisting of:- Name (string): container name
- Image (optional ImageMeta): container image metadata
- Args (optional map[string]string): extra provider specific flags
- Env (optional []corev1.EnvVar): environment variables
- Resources (optional corev1.ResourceRequirements): compute resources
- Command (optional []string): override container's entrypoint array
YAML example:
... spec: deployment: containers: - name: "example-container" image: repository: "example.com/repo" name: "image-name" tag: "v1.0.0" args: exampleArg: "value" env: - name: "EXAMPLE_ENV" value: "example-value" resources: limits: cpu: "1" memory: "1Gi" requests: cpu: "500m" memory: "500Mi" command: - "/bin/bash" ...
: container image customization, consisting of:- Repository (optional string): image registry (e.g., "example.com/repo")
- Name (optional string): image name (e.g., "provider-image")
- Tag (optional string): image tag (e.g., "v1.0.0")
: components and metadata fetch options, consisting of:- URL (optional string): URL for remote Github repository releases (e.g., "https://github.com/owner/repo/releases")
- Selector (optional metav1.LabelSelector): label selector to use for fetching provider components and metadata from ConfigMaps stored in the cluster
YAML example:
... spec: fetchConfig: url: "https://github.com/owner/repo/releases" selector: matchLabels: ...
: observed state of the Provider, consisting of:
- Contract (optional string): core provider contract being adhered to (e.g., "v1beta1")
- Conditions (optional clusterv1.Conditions): current service state of the provider
- ObservedGeneration (optional int64): latest generation observed by the controller
- InstalledVersion (optional string): version of the provider that is installed
YAML example:
contract: "v1beta1"
- type: "Ready"
status: "True"
reason: "ProviderAvailable"
message: "Provider is available and ready"
observedGeneration: 1
installedVersion: "v0.1.0"
In this section we provide some concrete examples of CAPI Operator API usage for various use-cases.
- As an admin, I want to install the aws infrastructure provider with specific controller flags.
apiVersion: v1
kind: Secret
name: aws-variables
namespace: capa-system
type: Opaque
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: aws
namespace: capa-system
version: v2.1.4
secretName: aws-variables
# These top level controller manager flags, supported by all the providers.
# These flags come with sensible defaults, thus requiring no or minimal
# changes for the most common scenarios.
bindAddress: ":8181"
syncPeriod: "500s"
url: https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases
- name: manager
# These are controller flags that are specific to a provider; usage
# is reserved for advanced scenarios only.
"--awscluster-concurrency": "12"
"--awsmachine-concurrency": "11"
- As an admin, I want to install aws infrastructure provider but override the container image of the CAPA deployment.
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: aws
namespace: capa-system
version: v2.1.4
secretName: aws-variables
- name: manager
repository: "gcr.io/myregistry"
name: "capa-controller"
tag: "v2.1.4-foo"
- As an admin, I want to change the resource limits for the manager pod in my control plane provider deployment.
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: ControlPlaneProvider
name: kubeadm
namespace: capi-kubeadm-control-plane-system
version: v1.4.3
secretName: capi-variables
- name: manager
cpu: 100m
memory: 30Mi
cpu: 100m
memory: 20Mi
- As an admin, I would like to fetch my azure provider components from a specific repository which is not the default.
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: myazure
namespace: capz-system
version: v1.9.3
secretName: azure-variables
url: https://github.com/myorg/awesome-azure-provider/releases
- As an admin, I would like to use the default fetch configurations by simply specifying the expected Cluster API provider names such as
, orcluster-api
instead of having to explicitly specify the fetch configuration. In the example below, since we are using 'vsphere' as the name of the InfrastructureProvider the operator will fetch it's configuration fromurl: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/releases
by default.
See more examples in the air-gapped environment section
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: vsphere
namespace: capv-system
version: v1.6.1
secretName: vsphere-variables
This Section covers the lifecycle of Cluster API providers managed by the Cluster API Operator, including installing, upgrading, modifying, and deleting a provider.
To install a new Cluster API provider with the Cluster API Operator, create a provider object as shown in the first example API usage for creating the secret with variables and the provider itself.
The operator processes a provider object by applying the following rules:
- The CoreProvider is installed first; other providers will be requeued until the core provider exists.
- Before installing any provider, the following pre-flight checks are executed:
- No other instance of the same provider (same Kind, same name) should exist in any namespace.
- The Cluster API contract (e.g., v1beta1) must match the contract of the core provider.
- The operator sets conditions on the provider object to surface any installation issues, including pre-flight checks and/or order of installation.
- If the FetchConfiguration is not defined, the operator applies the embedded fetch configuration for the given kind and
specified in the Cluster API code.
The installation process, managed by the operator, aligns with the implementation underlying the clusterctl init
command and includes these steps:
- Fetching provider artifacts (the components.yaml and metadata.yaml files).
- Applying image overrides, if any.
- Replacing variables in the infrastructure-components from EnvVar and Secret.
- Applying the resulting YAML to the cluster.
Differences between the operator and clusterctl init
- The operator installs one provider at a time while
clusterctl init
installs a group of providers in a single operation. - The operator stores fetched artifacts in a config map for reuse during subsequent reconciliations.
- The operator uses a Secret, while
clusterctl init
relies on environment variables and a local configuration file.
To trigger an upgrade for a Cluster API provider, change the spec.Version
field. All providers must follow the golden rule of respecting the same Cluster API contract supported by the core provider.
The operator performs the upgrade by:
- Deleting the current provider components, while preserving CRDs, namespaces, and user objects.
- Installing the new provider components.
Differences between the operator and clusterctl upgrade apply
- The operator upgrades one provider at a time while
clusterctl upgrade apply
upgrades a group of providers in a single operation. - With the declarative approach, users are responsible for manually editing the Provider objects' YAML, while
clusterctl upgrade apply --contract
automatically determines the latest available versions for each provider.
In addition to changing a provider version (upgrades), the operator supports modifying other provider fields such as controller flags and variables. This can be achieved through kubectl edit
or kubectl apply
to the provider object.
The operation works similarly to upgrades: The current provider instance is deleted while preserving CRDs, namespaces, and user objects. Then, a new provider instance with the updated flags/variables is installed.
Note: clusterctl
currently does not support this operation.
To delete a provider, remove the corresponding provider object. Provider deletion will be blocked if any workload clusters using the provider still exist. Furthermore, deletion of a core provider is blocked if other providers remain in the management cluster.
To install Cluster API providers in an air-gapped environment using the operator, address the following issues:
- Configure the operator for an air-gapped environment:
- Manually fetch and store a helm chart for the operator.
- Provide image overrides for the operator in from an accessible image repository.
- Configure providers for an air-gapped environment:
- Provide fetch configuration for each provider from an accessible location (e.g., an internal GitHub repository) or from pre-created ConfigMaps within the cluster.
- Provide image overrides for each provider to pull images from an accessible image repository.
Example Usage:
As an admin, I need to fetch the Azure provider components from within the cluster because I am working in an air-gapped environment.
In this example, there is a ConfigMap in the capz-system
namespace that defines the components and metadata of the provider.
The Azure InfrastructureProvider is configured with a fetchConfig
specifying the label selector, allowing the operator to determine the available versions of the Azure provider. Since the provider's version is marked as v1.9.3
, the operator uses the components information from the ConfigMap with matching label to install the Azure provider.
apiVersion: v1
kind: ConfigMap
provider-components: azure
name: v1.9.3
namespace: capz-system
components: |
# Components for v1.9.3 YAML go here
metadata: |
# Metadata information goes here
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
name: azure
namespace: capz-system
version: v1.9.3
secretName: azure-variables
provider-components: azure
There is a limit on the maximum size of a configmap - 1MiB. If the manifests do not fit into this size, Kubernetes will generate an error and provider installation fail. To avoid this, you can archive the manifests and put them in the configmap that way.
For example, you have two files: components.yaml
and metadata.yaml
. To create a working config map you need:
- Archive components.yaml using
cli tool
gzip -c components.yaml > components.gz
- Create a configmap manifest from the archived data
kubectl create configmap v1.9.3 --namespace=capz-system --from-file=components=components.gz --from-file=metadata=metadata.yaml --dry-run=client -o yaml > configmap.yaml
- Edit the file by adding "provider.cluster.x-k8s.io/compressed: true" annotation
yq eval -i '.metadata.annotations += {"provider.cluster.x-k8s.io/compressed": "true"}' configmap.yaml
Note: without this annotation operator won't be able to determine if the data is compressed or not.
- Add labels that will be used to match the configmap in
section of the provider
yq eval -i '.metadata.labels += {"my-label": "label-value"}' configmap.yaml
- Create a configmap in your kubernetes cluster using kubectl
kubectl create -f configmap.yaml