Skip to content

fix arm64 nodepool for kind #8156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 3, 2025
Merged

Conversation

upodroid
Copy link
Member

@upodroid upodroid commented Jun 3, 2025

/cc @ameukam @aojea @BenTheElder

The kernel bug appeared in the kind jobs on arm64 nodes so I changed their os too.

c4d is in preview so I'm forcing the nodepool to always run atleast 10 nodes of this type per zone. Looking at the nodepool size metrics, it has consistently stayed above 20 per zone for weeks. The autoscaler seems to be preferring c4 pool over c4d pool, which isn't ideal.

image

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/infra Infrastructure management, infrastructure design, code in infra/ area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters labels Jun 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: upodroid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 3, 2025
@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
+/- create replacement and then destroy

Terraform will perform the following actions:

  # google_vmwareengine_network_peering.gvce_peering will be updated in-place
~ resource "google_vmwareengine_network_peering" "gvce_peering" {
      ~ export_custom_routes_with_public_ip = false -> true
        id                                  = "projects/k8s-infra-prow-build/locations/global/networkPeerings/peer-with-gcve-project"
      ~ import_custom_routes_with_public_ip = false -> true
        name                                = "peer-with-gcve-project"
        # (13 unchanged attributes hidden)
    }

  # module.prow_build_nodepool_c4d_highmem_8_localssd.google_container_node_pool.node_pool must be replaced
+/- resource "google_container_node_pool" "node_pool" {
      ~ id                          = "projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool7-20250528124554315100000001" -> (known after apply)
      ~ initial_node_count          = 5 -> 1 # forces replacement
      ~ instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-a/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-48d85bd4-grp",
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-f2ab4558-grp",
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-c/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-2e4cd1ad-grp",
        ] -> (known after apply)
      ~ managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-a/instanceGroups/gke-prow-build-pool7-2025052812455431-48d85bd4-grp",
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroups/gke-prow-build-pool7-2025052812455431-f2ab4558-grp",
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-c/instanceGroups/gke-prow-build-pool7-2025052812455431-2e4cd1ad-grp",
        ] -> (known after apply)
      + max_pods_per_node           = (known after apply)
      ~ name                        = "pool7-20250528124554315100000001" -> (known after apply)
      ~ node_count                  = 5 -> (known after apply)
      + operation                   = (known after apply)
      ~ version                     = "1.32.3-gke.1927009" -> (known after apply)
        # (5 unchanged attributes hidden)

      ~ autoscaling {
          ~ location_policy      = "BALANCED" -> (known after apply)
          ~ min_node_count       = 5 -> 10
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
            # (1 unchanged attribute hidden)
        }

      ~ network_config (known after apply)

      ~ node_config {
          ~ effective_taints            = [] -> (known after apply)
          - enable_confidential_storage = false -> null
          ~ labels                      = {} -> (known after apply)
          ~ local_ssd_count             = 0 -> (known after apply)
          ~ logging_variant             = "DEFAULT" -> (known after apply)
          + min_cpu_platform            = (known after apply)
          - resource_labels             = {
              - "goog-gke-node-pool-provisioning-model" = "on-demand"
            } -> null
          - resource_manager_tags       = {} -> null
          - storage_pools               = [] -> null
          - tags                        = [] -> null
            # (13 unchanged attributes hidden)

          ~ confidential_nodes (known after apply)

          - ephemeral_storage_local_ssd_config {
              - local_ssd_count = 1 -> null
            }

          ~ gcfs_config (known after apply)

          ~ guest_accelerator (known after apply)

          ~ kubelet_config (known after apply)
          - kubelet_config {
              - allowed_unsafe_sysctls                 = [] -> null
              - container_log_max_files                = 0 -> null
              - cpu_cfs_quota                          = false -> null
              - image_gc_high_threshold_percent        = 0 -> null
              - image_gc_low_threshold_percent         = 0 -> null
              - insecure_kubelet_readonly_port_enabled = "TRUE" -> null
              - pod_pids_limit                         = 0 -> null
                # (5 unchanged attributes hidden)
            }

          ~ shielded_instance_config (known after apply)
          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          ~ windows_node_config (known after apply)
          - windows_node_config {
                # (1 unchanged attribute hidden)
            }

            # (1 unchanged block hidden)
        }

      ~ upgrade_settings (known after apply)
      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }

        # (1 unchanged block hidden)
    }

Plan: 1 to add, 1 to change, 1 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 1 to change, 1 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@upodroid
Copy link
Member Author

upodroid commented Jun 3, 2025

atlantis plan

@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Plan Error

Show Output
running 'sh -c' '/atlantis/bin/terraform1.12.1 init -input=false -upgrade' in '/atlantis/repos/kubernetes/k8s.io/8156/default/infra/gcp/terraform/k8s-infra-prow-build': exit status 1
Initializing the backend...
Upgrading modules...
Downloading registry.terraform.io/terraform-google-modules/iam/google 8.1.0 for iam...
- iam in .terraform/modules/iam/modules/projects_iam
- iam.helper in .terraform/modules/iam/modules/helper
- project in ../modules/gke-project
- prow_build_cluster in ../modules/gke-cluster
- prow_build_nodepool_c4_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_c4a_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_c4d_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_n1_highmem_8_localssd in ../modules/gke-nodepool
Downloading git::https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git?ref=v39.0.0&depth=1 for sig_node_node_pool_1_n4_highmem_8...
- sig_node_node_pool_1_n4_highmem_8 in .terraform/modules/sig_node_node_pool_1_n4_highmem_8/modules/gke-nodepool
- workload_identity_service_accounts in ../modules/workload-identity-service-account
Initializing provider plugins...
- Finding hashicorp/google-beta versions matching ">= 6.28.0, ~> 6.31.0, < 7.0.0"...
- Finding hashicorp/google versions matching ">= 3.53.0, >= 6.28.0, ~> 6.31.0, < 7.0.0"...
- Installing hashicorp/google-beta v6.31.1...
- Installing hashicorp/google v6.31.1...
- Installed hashicorp/google v6.31.1 (signed by HashiCorp)
╷
│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/google-beta v6.31.1: open
│ /atlantis/plugin-cache/registry.terraform.io/hashicorp/google-beta/6.31.1/linux_amd64/terraform-provider-google-beta_v6.31.1_x5:
│ text file busy
╵

@upodroid
Copy link
Member Author

upodroid commented Jun 3, 2025

atlantis plan

@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place

Terraform will perform the following actions:

  # google_vmwareengine_network_peering.gvce_peering will be updated in-place
~ resource "google_vmwareengine_network_peering" "gvce_peering" {
      ~ export_custom_routes_with_public_ip = false -> true
        id                                  = "projects/k8s-infra-prow-build/locations/global/networkPeerings/peer-with-gcve-project"
      ~ import_custom_routes_with_public_ip = false -> true
        name                                = "peer-with-gcve-project"
        # (13 unchanged attributes hidden)
    }

  # module.prow_build_nodepool_c4d_highmem_8_localssd.google_container_node_pool.node_pool will be updated in-place
~ resource "google_container_node_pool" "node_pool" {
        id                          = "projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool7-20250528124554315100000001"
        name                        = "pool7-20250528124554315100000001"
        # (10 unchanged attributes hidden)

      ~ autoscaling {
          ~ min_node_count       = 5 -> 10
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 0 to add, 2 to change, 0 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@aojea
Copy link
Member

aojea commented Jun 3, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2025
@k8s-ci-robot k8s-ci-robot merged commit 8d5136d into kubernetes:main Jun 3, 2025
7 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 3, 2025
@k8s-infra-ci-robot
Copy link
Contributor

Locks and plans deleted for the projects and workspaces modified in this pull request:

  • dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

@BenTheElder
Copy link
Member

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure area/infra Infrastructure management, infrastructure design, code in infra/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants