Skip to content

Infinite Loop when upgrading system-upgrade-controller due to missing ServiceAccount Annotations #361

Open
@przemytn

Description

@przemytn

Version

v0.15.2

Platform/Architecture

linux-amd64

Describe the bug

When attempting to upgrade the system-upgrade-controller with over 300 pods, the system enters an infinite loop. This is caused by missing required Helm annotations in the ServiceAccount that prevent Helm from managing the resource properly.

To Reproduce

Deploy system-upgrade-controller ServiceAccount without proper Helm annotations
Try to upgrade using Helm with command similar to:

helm upgrade --history-max=5 --install=true --labels=catalog.cattle.io/cluster-repo-name=rancher-charts --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-system-upgrade-controller-106.0.0.yaml --version=106.0.0 --wait=true system-upgrade-controller /home/shell/helm/system-upgrade-controller-106.0.0.tgz

Observe the error and infinite loop behavior with >300 pods

Expected behavior

The ServiceAccount should include the proper Helm annotations to allow Helm to recognize and manage it during upgrades. The upgrade process should complete normally without entering an infinite loop.
Actual behavior

Current ServiceAccount is defined as:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: system-upgrade-controller
  namespace: cattle-system

This results in the following error during upgrade:

Error: Unable to continue with install: ServiceAccount "system-upgrade-controller" in namespace "cattle-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "system-upgrade-controller"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "cattle-system"
The system then enters an infinite loop trying to reconcile this situation, particularly problematic when there are over 300 pods in the environment.
Correct ServiceAccount should include

apiVersion: v1
kind: ServiceAccount
metadata:
  name: system-upgrade-controller
  namespace: cattle-system
  labels:
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-name: system-upgrade-controller
    meta.helm.sh/release-namespace: cattle-system

Additional context

This issue seems to be particularly severe in environments with many pods (300+). The infinite loop appears to be related to Helm's retry mechanism when it cannot properly manage existing resources due to missing annotations. Note that issues on RKE2 charts are currently disabled, so this bug report may need to be submitted through alternative channels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions