Description
Version
v0.15.2
Platform/Architecture
linux-amd64
Describe the bug
When attempting to upgrade the system-upgrade-controller with over 300 pods, the system enters an infinite loop. This is caused by missing required Helm annotations in the ServiceAccount that prevent Helm from managing the resource properly.
To Reproduce
Deploy system-upgrade-controller ServiceAccount without proper Helm annotations
Try to upgrade using Helm with command similar to:
helm upgrade --history-max=5 --install=true --labels=catalog.cattle.io/cluster-repo-name=rancher-charts --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-system-upgrade-controller-106.0.0.yaml --version=106.0.0 --wait=true system-upgrade-controller /home/shell/helm/system-upgrade-controller-106.0.0.tgz
Observe the error and infinite loop behavior with >300 pods
Expected behavior
The ServiceAccount should include the proper Helm annotations to allow Helm to recognize and manage it during upgrades. The upgrade process should complete normally without entering an infinite loop.
Actual behavior
Current ServiceAccount is defined as:
apiVersion: v1
kind: ServiceAccount
metadata:
name: system-upgrade-controller
namespace: cattle-system
This results in the following error during upgrade:
Error: Unable to continue with install: ServiceAccount "system-upgrade-controller" in namespace "cattle-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "system-upgrade-controller"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "cattle-system"
The system then enters an infinite loop trying to reconcile this situation, particularly problematic when there are over 300 pods in the environment.
Correct ServiceAccount should include
apiVersion: v1
kind: ServiceAccount
metadata:
name: system-upgrade-controller
namespace: cattle-system
labels:
app.kubernetes.io/managed-by: Helm
annotations:
meta.helm.sh/release-name: system-upgrade-controller
meta.helm.sh/release-namespace: cattle-system
Additional context
This issue seems to be particularly severe in environments with many pods (300+). The infinite loop appears to be related to Helm's retry mechanism when it cannot properly manage existing resources due to missing annotations. Note that issues on RKE2 charts are currently disabled, so this bug report may need to be submitted through alternative channels.