Skip to content

[Feature] Support for planned configuration changes in maintenance windows #4530

Open
@audunsolemdal

Description

@audunsolemdal

Is your feature request related to a problem? Please describe.

Today, when making certain changes to node pools causes the node pools to be recreated immediately.

Some examples involve changing properties such as:

  • vm_size
  • host_encryption_enabled
  • linux_os_config
  • max_pods

At the cluster level, changing certain settings such as node_os_upgrade_channel leads to immediate re-imaging of all cluster nodes, more or less for all nodes at once.

Changing such settings during working hours may lead to unnecessary downtime for our pods if they do not have pod disruption budgets configured.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Ideally I would like to have an option to schedule all my planned configuration changes to be applied in a maintenance window.

Not sure on the actual API implementation, perhaps something like this for Terraform

    maintenance_window_auto_upgrade {
      frequency   = "Weekly"
      interval    = 1
      duration    = 4
      day_of_week = "Monday"
      start_time  = "01:30"

    dynamic "node_pool_upgrade_settings" {
      for_each = var.node_pools
      content {
        node_pool_id = each.value.id
        update_fields = {
          max_pods = 75
          vm_size = "Standard_E2ds_v5"
        }
      }
    }

    cluster_upgrade_settings {
        update_fields = {
          node_os_upgrade_channel = "NodeImage"
          }
        }
      }

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

az aks nodepool update commands via scheduled Github Actions

Some downsides to this:

  • Need to manage my clusters in azure cli in addition to terraform/bicep
  • We want to schedule the configuration changes to be applied "only if" there is a new AKS image version or node OS image available. This in order to minimize the number of times the nodes are re-imaged and our workloads face possible disruptions.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions