Skip to content

Node Group Creation/Upgrade Failure: Instances Failed to Join Kubernetes Cluster with labels #3283

Closed as not planned
@tanawatcha

Description

@tanawatcha

Description

I encountered issues with node group creation and upgrades using the terraform-aws-eks module version v20.31.6. Below are the details:

  1. During the initial creation of a node group with the labels configuration, the process took 20 minutes and failed with the error:
    NodeCreationFailure: Instances failed to join the Kubernetes cluster
    Removing the labels configuration allowed the node group to be created successfully. I then re-enabled the labels and ran terraform apply, which updated the labels successfully.

  2. While upgrading a node group with use_latest_ami_release_version = true, the process took a long time and failed with the error:
    NodeCreationFailure: Couldn't proceed with upgrade process as new nodes are not joining the node group.
    I suspect this issue might be related to the labelsconfiguration, similar to issue 1.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version:

    • v20.31.6
  • Terraform version:

    • Terraform v1.5.7 on darwin_arm64
  • Provider version(s):

    • hashicorp/aws: 5.84.0

Reproduction Code [Required]

module "eks" {
  source = "terraform-aws-modules/eks/aws"
  #Develop on git Tags v20.31.6
  version         = "~> 20.31"
  cluster_name    = "eks-cluster-test"
  cluster_version = 1.31
  vpc_id                   = data.aws_vpc.main.id
  subnet_ids               = data.aws_subnet.private_subnets[*].id
  control_plane_subnet_ids = data.aws_subnet.private_eks_subnets[*].id
  cluster_endpoint_public_access       = true
  cluster_endpoint_public_access_cidrs = var.eks_public_access_cidrs
  cluster_service_ipv4_cidr = var.cluster_service_ipv4_cidr
  enable_irsa = true
  cluster_addons = {
    aws-ebs-csi-driver = {
      most_recent = true
    }
    coredns = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"
      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"
    }
    vpc-cni = {
      most_recent       = true
      before_compute    = true
      resolve_conflicts = "OVERWRITE"

      configuration_values = jsonencode({
        env = {
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }
  cluster_upgrade_policy = {
    support_type = "STANDARD"
  }
  eks_managed_node_groups = {
    default_al2023_ng = {
      name                           = "al2023-df-ng"
      use_latest_ami_release_version = true
      ami_type                       = "AL2023_x86_64_STANDARD"
      instance_types                 = ["c5.large"]
      capacity_type                  = "ON_DEMAND"
      subnet_ids                     = data.aws_subnet.private_subnets[*].id
      disk_size                      = 100
      min_size                       = 1
      max_size                       = 3
      desired_size                   = 2
      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size = 100
            volume_type = "gp3"
            iops        = 3000
            throughput  = 150
            delete_on_termination = true
          }
        }
      }
      cloudinit_pre_nodeadm = [
        {
          content_type = "application/node.eks.aws"
          content      = <<-EOT
            ---
            apiVersion: node.eks.aws/v1alpha1
            kind: NodeConfig
            spec:
              kubelet:
                config:
                  shutdownGracePeriod: 30s
                  featureGates:
                    DisableKubeletCloudCredentialProviders: true
          EOT
        }
      ]
      labels = {
        "karpenter.sh/controller"    = "true"
      }
    }

  }
  enable_cluster_creator_admin_permissions = true
  authentication_mode                      = "API_AND_CONFIG_MAP"
  access_entries = {
    adminteam = {
      principal_arn = iam_role_arn_xxxxx
      policy_associations = {
        admin_policy = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type       = "cluster"
          }
        }
      }
    }
  }
  node_security_group_tags = merge(var.tags, {
    "karpenter.sh/discovery" = "eks-cluster-test"
  })
  tags = var.tags
}

Steps to reproduce the behavior:

  1. Create a node group with the provided configuration.
  2. Observe failure during the creation process (NodeCreationFailure error).
  3. Remove the labels configuration and reapply Terraform.
  4. Re-enable labels and apply Terraform again (successfully updates labels).
  5. Enable use_latest_ami_release_version = trueand attempt to upgrade.
  6. Observe the upgrade failure with the error: NodeCreationFailure: Couldn't proceed with upgrade process as new nodes are not joining the node group.

Expected behavior

  1. Node group creation should succeed withlabelsconfigured.
  2. Node group upgrade should proceed successfully when use_latest_ami_release_version = true with labels configured.

Actual behavior

  1. Node group creation fails when labels are configured initially.

  2. Node group upgrade fails with the error: NodeCreationFailure: Couldn't proceed with upgrade process as new nodes are not joining the node group.

Additional context

  1. The issue seems to be resolved temporarily by removinglabelsduring the initial creation and reapplying them later. However, the upgrade process remains problematic when use_latest_ami_release_version = true.
  2. I suspect the issue might be related to AMI updates or the interaction between use_latest_ami_release_version and labels.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions