fix arm64 nodepool for kind #8156

upodroid · 2025-06-03T09:38:16Z

The kernel bug appeared in the kind jobs on arm64 nodes so I changed their os too.

c4d is in preview so I'm forcing the nodepool to always run atleast 10 nodes of this type per zone. Looking at the nodepool size metrics, it has consistently stayed above 20 per zone for weeks. The autoscaler seems to be preferring c4 pool over c4d pool, which isn't ideal.

k8s-ci-robot · 2025-06-03T09:38:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: upodroid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~infra/gcp/terraform/k8s-infra-prow-build/OWNERS~~ [upodroid]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-infra-ci-robot · 2025-06-03T09:38:39Z

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
+/- create replacement and then destroy

Terraform will perform the following actions:

  # google_vmwareengine_network_peering.gvce_peering will be updated in-place
~ resource "google_vmwareengine_network_peering" "gvce_peering" {
      ~ export_custom_routes_with_public_ip = false -> true
        id                                  = "projects/k8s-infra-prow-build/locations/global/networkPeerings/peer-with-gcve-project"
      ~ import_custom_routes_with_public_ip = false -> true
        name                                = "peer-with-gcve-project"
        # (13 unchanged attributes hidden)
    }

  # module.prow_build_nodepool_c4d_highmem_8_localssd.google_container_node_pool.node_pool must be replaced
+/- resource "google_container_node_pool" "node_pool" {
      ~ id                          = "projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool7-20250528124554315100000001" -> (known after apply)
      ~ initial_node_count          = 5 -> 1 # forces replacement
      ~ instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-a/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-48d85bd4-grp",
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-f2ab4558-grp",
          - "https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-c/instanceGroupManagers/gke-prow-build-pool7-2025052812455431-2e4cd1ad-grp",
        ] -> (known after apply)
      ~ managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-a/instanceGroups/gke-prow-build-pool7-2025052812455431-48d85bd4-grp",
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroups/gke-prow-build-pool7-2025052812455431-f2ab4558-grp",
          - "https://www.googleapis.com/compute/beta/projects/k8s-infra-prow-build/zones/us-central1-c/instanceGroups/gke-prow-build-pool7-2025052812455431-2e4cd1ad-grp",
        ] -> (known after apply)
      + max_pods_per_node           = (known after apply)
      ~ name                        = "pool7-20250528124554315100000001" -> (known after apply)
      ~ node_count                  = 5 -> (known after apply)
      + operation                   = (known after apply)
      ~ version                     = "1.32.3-gke.1927009" -> (known after apply)
        # (5 unchanged attributes hidden)

      ~ autoscaling {
          ~ location_policy      = "BALANCED" -> (known after apply)
          ~ min_node_count       = 5 -> 10
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
            # (1 unchanged attribute hidden)
        }

      ~ network_config (known after apply)

      ~ node_config {
          ~ effective_taints            = [] -> (known after apply)
          - enable_confidential_storage = false -> null
          ~ labels                      = {} -> (known after apply)
          ~ local_ssd_count             = 0 -> (known after apply)
          ~ logging_variant             = "DEFAULT" -> (known after apply)
          + min_cpu_platform            = (known after apply)
          - resource_labels             = {
              - "goog-gke-node-pool-provisioning-model" = "on-demand"
            } -> null
          - resource_manager_tags       = {} -> null
          - storage_pools               = [] -> null
          - tags                        = [] -> null
            # (13 unchanged attributes hidden)

          ~ confidential_nodes (known after apply)

          - ephemeral_storage_local_ssd_config {
              - local_ssd_count = 1 -> null
            }

          ~ gcfs_config (known after apply)

          ~ guest_accelerator (known after apply)

          ~ kubelet_config (known after apply)
          - kubelet_config {
              - allowed_unsafe_sysctls                 = [] -> null
              - container_log_max_files                = 0 -> null
              - cpu_cfs_quota                          = false -> null
              - image_gc_high_threshold_percent        = 0 -> null
              - image_gc_low_threshold_percent         = 0 -> null
              - insecure_kubelet_readonly_port_enabled = "TRUE" -> null
              - pod_pids_limit                         = 0 -> null
                # (5 unchanged attributes hidden)
            }

          ~ shielded_instance_config (known after apply)
          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          ~ windows_node_config (known after apply)
          - windows_node_config {
                # (1 unchanged attribute hidden)
            }

            # (1 unchanged block hidden)
        }

      ~ upgrade_settings (known after apply)
      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }

        # (1 unchanged block hidden)
    }

Plan: 1 to add, 1 to change, 1 to destroy.

▶️ To apply this plan, comment:

atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build

🚮 To delete this plan and lock, click here

🔁 To plan this project again, comment:

atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 1 to change, 1 to destroy.

⏩ To apply all unapplied plans from this Pull Request, comment:
```
atlantis apply
```
🚮 To delete all plans and locks from this Pull Request, comment:
```
atlantis unlock
```

upodroid · 2025-06-03T10:24:17Z

atlantis plan

k8s-infra-ci-robot · 2025-06-03T10:24:28Z

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Plan Error

Show Output

running 'sh -c' '/atlantis/bin/terraform1.12.1 init -input=false -upgrade' in '/atlantis/repos/kubernetes/k8s.io/8156/default/infra/gcp/terraform/k8s-infra-prow-build': exit status 1
Initializing the backend...
Upgrading modules...
Downloading registry.terraform.io/terraform-google-modules/iam/google 8.1.0 for iam...
- iam in .terraform/modules/iam/modules/projects_iam
- iam.helper in .terraform/modules/iam/modules/helper
- project in ../modules/gke-project
- prow_build_cluster in ../modules/gke-cluster
- prow_build_nodepool_c4_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_c4a_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_c4d_highmem_8_localssd in ../modules/gke-nodepool
- prow_build_nodepool_n1_highmem_8_localssd in ../modules/gke-nodepool
Downloading git::https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git?ref=v39.0.0&depth=1 for sig_node_node_pool_1_n4_highmem_8...
- sig_node_node_pool_1_n4_highmem_8 in .terraform/modules/sig_node_node_pool_1_n4_highmem_8/modules/gke-nodepool
- workload_identity_service_accounts in ../modules/workload-identity-service-account
Initializing provider plugins...
- Finding hashicorp/google-beta versions matching ">= 6.28.0, ~> 6.31.0, < 7.0.0"...
- Finding hashicorp/google versions matching ">= 3.53.0, >= 6.28.0, ~> 6.31.0, < 7.0.0"...
- Installing hashicorp/google-beta v6.31.1...
- Installing hashicorp/google v6.31.1...
- Installed hashicorp/google v6.31.1 (signed by HashiCorp)
╷
│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/google-beta v6.31.1: open
│ /atlantis/plugin-cache/registry.terraform.io/hashicorp/google-beta/6.31.1/linux_amd64/terraform-provider-google-beta_v6.31.1_x5:
│ text file busy
╵

upodroid · 2025-06-03T10:39:27Z

atlantis plan

k8s-infra-ci-robot · 2025-06-03T10:39:48Z

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place

Terraform will perform the following actions:

  # google_vmwareengine_network_peering.gvce_peering will be updated in-place
~ resource "google_vmwareengine_network_peering" "gvce_peering" {
      ~ export_custom_routes_with_public_ip = false -> true
        id                                  = "projects/k8s-infra-prow-build/locations/global/networkPeerings/peer-with-gcve-project"
      ~ import_custom_routes_with_public_ip = false -> true
        name                                = "peer-with-gcve-project"
        # (13 unchanged attributes hidden)
    }

  # module.prow_build_nodepool_c4d_highmem_8_localssd.google_container_node_pool.node_pool will be updated in-place
~ resource "google_container_node_pool" "node_pool" {
        id                          = "projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool7-20250528124554315100000001"
        name                        = "pool7-20250528124554315100000001"
        # (10 unchanged attributes hidden)

      ~ autoscaling {
          ~ min_node_count       = 5 -> 10
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

▶️ To apply this plan, comment:

atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build

🚮 To delete this plan and lock, click here

🔁 To plan this project again, comment:

atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 0 to add, 2 to change, 0 to destroy.

⏩ To apply all unapplied plans from this Pull Request, comment:
```
atlantis apply
```
🚮 To delete all plans and locks from this Pull Request, comment:
```
atlantis unlock
```

aojea · 2025-06-03T11:25:26Z

/lgtm

k8s-infra-ci-robot · 2025-06-03T11:26:41Z

Locks and plans deleted for the projects and workspaces modified in this pull request:

dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

BenTheElder · 2025-06-03T16:34:23Z

thanks

fix arm64 nodepool for kind

93e2144

k8s-ci-robot requested review from ameukam, aojea and BenTheElder June 3, 2025 09:38

k8s-ci-robot assigned aojea Jun 3, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2025

k8s-ci-robot merged commit 8d5136d into kubernetes:main Jun 3, 2025
7 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix arm64 nodepool for kind #8156

fix arm64 nodepool for kind #8156

upodroid commented Jun 3, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

upodroid commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

upodroid commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

aojea commented Jun 3, 2025

Uh oh!

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

BenTheElder commented Jun 3, 2025

Uh oh!

Uh oh!

fix arm64 nodepool for kind #8156

fix arm64 nodepool for kind #8156

Conversation

upodroid commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

upodroid commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

upodroid commented Jun 3, 2025

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

aojea commented Jun 3, 2025

Uh oh!

Uh oh!

k8s-infra-ci-robot commented Jun 3, 2025

Uh oh!

BenTheElder commented Jun 3, 2025

Uh oh!

Uh oh!

upodroid commented Jun 3, 2025 •

edited

Loading