-
Notifications
You must be signed in to change notification settings - Fork 8
feat: adds failure domain api for AWS and EKS #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| +++ | ||
| title = "AWS Failure Domain" | ||
| +++ | ||
|
|
||
| The AWS failure domain customization allows the user to specify the AWS availability zone (failure domain) for worker node deployments. | ||
| This customization can be applied to individual MachineDeployments to distribute worker nodes across different availability zones for high availability. | ||
| This customization will be available when the | ||
| [provider-specific cluster configuration patch]({{< ref "..">}}) is included in the `ClusterClass`. | ||
|
|
||
| ## Example | ||
|
|
||
| To specify a failure domain for worker nodes, use the following configuration: | ||
|
|
||
| ```yaml | ||
| apiVersion: cluster.x-k8s.io/v1beta1 | ||
| kind: Cluster | ||
| metadata: | ||
| name: <NAME> | ||
| spec: | ||
| topology: | ||
| variables: | ||
| - name: workerConfig | ||
| value: | ||
| aws: | ||
| failureDomain: us-west-2a | ||
| ``` | ||
|
|
||
| You can customize individual MachineDeployments by using the overrides field to deploy workers across multiple availability zones: | ||
|
|
||
| ```yaml | ||
| spec: | ||
| topology: | ||
| # ... | ||
| workers: | ||
| machineDeployments: | ||
| - class: default-worker | ||
| name: md-0 | ||
| variables: | ||
| overrides: | ||
| - name: workerConfig | ||
| value: | ||
| aws: | ||
| failureDomain: us-west-2a | ||
| - class: default-worker | ||
| name: md-1 | ||
| variables: | ||
| overrides: | ||
| - name: workerConfig | ||
| value: | ||
| aws: | ||
| failureDomain: us-west-2b | ||
| - class: default-worker | ||
| name: md-2 | ||
| variables: | ||
| overrides: | ||
| - name: workerConfig | ||
| value: | ||
| aws: | ||
| failureDomain: us-west-2c | ||
| ``` | ||
|
|
||
| ## Resulting CAPA Configuration | ||
|
|
||
| Applying this configuration will result in the following value being set: | ||
|
|
||
| - worker `MachineDeployment`: | ||
|
|
||
| - ```yaml | ||
| spec: | ||
| template: | ||
| spec: | ||
| failureDomain: us-west-2a | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,73 @@ | ||||||
| +++ | ||||||
| title = "AWS Failure Domain" | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| +++ | ||||||
|
|
||||||
| The AWS failure domain customization allows the user to specify the AWS availability zone (failure domain) for worker node deployments. | ||||||
| This customization can be applied to individual MachineDeployments to distribute worker nodes across different availability zones for high availability. | ||||||
| This customization will be available when the | ||||||
| [provider-specific cluster configuration patch]({{< ref "..">}}) is included in the `ClusterClass`. | ||||||
|
|
||||||
| ## Example | ||||||
|
|
||||||
| To specify a failure domain for worker nodes, use the following configuration: | ||||||
|
|
||||||
| ```yaml | ||||||
| apiVersion: cluster.x-k8s.io/v1beta1 | ||||||
| kind: Cluster | ||||||
| metadata: | ||||||
| name: <NAME> | ||||||
| spec: | ||||||
| topology: | ||||||
| variables: | ||||||
| - name: workerConfig | ||||||
| value: | ||||||
| aws: | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| failureDomain: us-west-2a | ||||||
| ``` | ||||||
|
|
||||||
| You can customize individual MachineDeployments by using the overrides field to deploy workers across multiple availability zones: | ||||||
|
|
||||||
| ```yaml | ||||||
| spec: | ||||||
| topology: | ||||||
| # ... | ||||||
| workers: | ||||||
| machineDeployments: | ||||||
| - class: default-worker | ||||||
| name: md-0 | ||||||
| variables: | ||||||
| overrides: | ||||||
| - name: workerConfig | ||||||
| value: | ||||||
| aws: | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| failureDomain: us-west-2a | ||||||
| - class: default-worker | ||||||
| name: md-1 | ||||||
| variables: | ||||||
| overrides: | ||||||
| - name: workerConfig | ||||||
| value: | ||||||
| aws: | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| failureDomain: us-west-2b | ||||||
| - class: default-worker | ||||||
| name: md-2 | ||||||
| variables: | ||||||
| overrides: | ||||||
| - name: workerConfig | ||||||
| value: | ||||||
| aws: | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| failureDomain: us-west-2c | ||||||
| ``` | ||||||
|
|
||||||
| ## Resulting CAPA Configuration | ||||||
|
|
||||||
| Applying this configuration will result in the following value being set: | ||||||
|
|
||||||
| - worker `MachineDeployment`: | ||||||
|
|
||||||
| - ```yaml | ||||||
| spec: | ||||||
| template: | ||||||
| spec: | ||||||
| failureDomain: us-west-2a | ||||||
| ``` | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| // Copyright 2023 Nutanix. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| package failuredomain | ||
|
|
||
| import ( | ||
| "testing" | ||
|
|
||
| . "github.com/onsi/ginkgo/v2" | ||
| . "github.com/onsi/gomega" | ||
| ) | ||
|
|
||
| func TestFailureDomainPatch(t *testing.T) { | ||
| RegisterFailHandler(Fail) | ||
| RunSpecs(t, "AWS failure domain mutator suite") | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| // Copyright 2023 Nutanix. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| package failuredomain | ||
|
|
||
| import ( | ||
| "context" | ||
|
|
||
| apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1" | ||
| "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" | ||
| clusterv1 "sigs.k8s.io/cluster-api/api/v1beta1" | ||
| runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1" | ||
| ctrl "sigs.k8s.io/controller-runtime" | ||
| "sigs.k8s.io/controller-runtime/pkg/client" | ||
|
|
||
| "github.com/nutanix-cloud-native/cluster-api-runtime-extensions-nutanix/api/v1alpha1" | ||
| "github.com/nutanix-cloud-native/cluster-api-runtime-extensions-nutanix/common/pkg/capi/clustertopology/handlers/mutation" | ||
| "github.com/nutanix-cloud-native/cluster-api-runtime-extensions-nutanix/common/pkg/capi/clustertopology/variables" | ||
| ) | ||
|
|
||
| const ( | ||
| // VariableName is the external patch variable name. | ||
| VariableName = "failureDomain" | ||
| ) | ||
|
|
||
| type awsFailureDomainWorkerPatchHandler struct { | ||
| variableName string | ||
| variableFieldPath []string | ||
| } | ||
|
|
||
| func NewWorkerPatch() *awsFailureDomainWorkerPatchHandler { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should add failure domain patch for AWS control plane too. |
||
| return NewAWSFailureDomainWorkerPatchHandler( | ||
| v1alpha1.WorkerConfigVariableName, | ||
| v1alpha1.AWSVariableName, | ||
| VariableName, | ||
| ) | ||
| } | ||
|
|
||
| func NewAWSFailureDomainWorkerPatchHandler( | ||
| variableName string, | ||
| variableFieldPath ...string, | ||
| ) *awsFailureDomainWorkerPatchHandler { | ||
| return &awsFailureDomainWorkerPatchHandler{ | ||
| variableName: variableName, | ||
| variableFieldPath: variableFieldPath, | ||
| } | ||
| } | ||
|
|
||
| func (h *awsFailureDomainWorkerPatchHandler) Mutate( | ||
| ctx context.Context, | ||
| obj *unstructured.Unstructured, | ||
| vars map[string]apiextensionsv1.JSON, | ||
| holderRef runtimehooksv1.HolderReference, | ||
| _ client.ObjectKey, | ||
| _ mutation.ClusterGetter, | ||
| ) error { | ||
| log := ctrl.LoggerFrom(ctx).WithValues( | ||
| "holderRef", holderRef, | ||
| ) | ||
|
|
||
| failureDomainVar, err := variables.Get[string]( | ||
| vars, | ||
| h.variableName, | ||
| h.variableFieldPath..., | ||
| ) | ||
| if err != nil { | ||
| if variables.IsNotFoundError(err) { | ||
| log.V(5).Info("AWS failure domain variable for worker not defined") | ||
| return nil | ||
| } | ||
| return err | ||
| } | ||
|
|
||
| log = log.WithValues( | ||
| "variableName", | ||
| h.variableName, | ||
| "variableFieldPath", | ||
| h.variableFieldPath, | ||
| "variableValue", | ||
| failureDomainVar, | ||
| ) | ||
|
|
||
| // Check if this is a MachineDeployment | ||
| if obj.GetKind() != "MachineDeployment" || obj.GetAPIVersion() != clusterv1.GroupVersion.String() { | ||
| log.V(5).Info("not a MachineDeployment, skipping") | ||
| return nil | ||
| } | ||
|
|
||
| log.WithValues( | ||
| "patchedObjectKind", obj.GetKind(), | ||
| "patchedObjectName", client.ObjectKeyFromObject(obj), | ||
| ).Info("setting failure domain in worker MachineDeployment spec") | ||
|
|
||
| if err := unstructured.SetNestedField( | ||
| obj.Object, | ||
| failureDomainVar, | ||
| "spec", "template", "spec", "failureDomain", | ||
| ); err != nil { | ||
| return err | ||
|
Comment on lines
+84
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should be using |
||
| } | ||
|
|
||
| return nil | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to verify this manually for EKS? From the docs it uses
failureDomain: "1"https://cluster-api-aws.sigs.k8s.io/topics/failure-domains/worker-nodes#failure-domains-in-worker-nodes
But in the CAPA code it does look like it should be the AZ like you have it documented