Description
Bug Report
I have a Helm operator that installs releases in multiple namespaces in my K8s cluster. It is working mostly fine, however sometimes, seemingly at random, the release fails. I can see that the operator logged the error below.
It seems that the operator is trying to get the correct resource, but from the wrong API group. I don't know how it could happen, but it seems it is sometimes confusing API groups between resources.
In the example below, the Helm chart that is getting installed has only 2 resources:
- A
Deployment
in API groupapps
- A
ConfigMap
in API group""
Sometimes, at random, the operator will try to manipulate either a Deployment
in API group ""
or a ConfigMap
in API group apps
. This fails the release, as Helm tries to manipulate resources that do not exist. When the release is tried again, it might fail again (a different resource might be the problem) or it might succeed.
Eventually, all resources are properly reconciled. The impact of this is that the reconciliation takes significantly more time.
What did you do?
- Define a Helm chart with 2 resources
- Use the operator SDK Helm operator to reconcile Helm releases in multiple namespaces
- Check the operator pod logs
What did you expect to see?
The Helm releases are reconciled successfully with no errors.
What did you see instead? Under which circumstances?
The following errors appear:
could not get object: configmaps.apps "tenant-50139-xpodbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "configmaps" in API group "apps" in the namespace "tenant-50139"
could not get object: deployments "xpbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "deployments" in API group "" in the namespace "tenant-50262"
Environment
Operator type:
/language helm
Kubernetes cluster type:
Google Kubernetes Engine
$ operator-sdk version
"v1.26.0", commit: "cbeec475e4612e19f1047ff7014342afe93f60d2", kubernetes version: "1.25.0", go version: "go1.19.3", GOOS: "linux", GOARCH: "amd64"
Docker image: quay.io/operator-framework/helm-operator:v1.26.0
(Note that this also happens with operator-sdk 1.19.1.)
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.5-gke.600", GitCommit:"fb4964ee848bc4d25d42d60386c731836059d1d8", GitTreeState:"clean", BuildDate:"2022-09-22T09:24:55Z", GoVersion:"go1.18.6b7", Compiler:"gc", Platform:"linux/amd64"}
Possible Solution
- The randomness seems to point to a race condition
- The issue could be related to Helm, or also to the K8s go client, I'm not sure.