Skip to content

Resources are sometimes manipulated with the wrong API group #6220

Closed
@pjestin-sym

Description

@pjestin-sym

Bug Report

I have a Helm operator that installs releases in multiple namespaces in my K8s cluster. It is working mostly fine, however sometimes, seemingly at random, the release fails. I can see that the operator logged the error below.

It seems that the operator is trying to get the correct resource, but from the wrong API group. I don't know how it could happen, but it seems it is sometimes confusing API groups between resources.

In the example below, the Helm chart that is getting installed has only 2 resources:

  • A Deployment in API group apps
  • A ConfigMap in API group ""

Sometimes, at random, the operator will try to manipulate either a Deployment in API group "" or a ConfigMap in API group apps. This fails the release, as Helm tries to manipulate resources that do not exist. When the release is tried again, it might fail again (a different resource might be the problem) or it might succeed.

Eventually, all resources are properly reconciled. The impact of this is that the reconciliation takes significantly more time.

What did you do?

  • Define a Helm chart with 2 resources
  • Use the operator SDK Helm operator to reconcile Helm releases in multiple namespaces
  • Check the operator pod logs

What did you expect to see?

The Helm releases are reconciled successfully with no errors.

What did you see instead? Under which circumstances?

The following errors appear:

could not get object: configmaps.apps "tenant-50139-xpodbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "configmaps" in API group "apps" in the namespace "tenant-50139"
could not get object: deployments "xpbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "deployments" in API group "" in the namespace "tenant-50262"

Environment

Operator type:

/language helm

Kubernetes cluster type:

Google Kubernetes Engine

$ operator-sdk version

"v1.26.0", commit: "cbeec475e4612e19f1047ff7014342afe93f60d2", kubernetes version: "1.25.0", go version: "go1.19.3", GOOS: "linux", GOARCH: "amd64"

Docker image: quay.io/operator-framework/helm-operator:v1.26.0

(Note that this also happens with operator-sdk 1.19.1.)

$ kubectl version

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.5-gke.600", GitCommit:"fb4964ee848bc4d25d42d60386c731836059d1d8", GitTreeState:"clean", BuildDate:"2022-09-22T09:24:55Z", GoVersion:"go1.18.6b7", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

  • The randomness seems to point to a race condition
  • The issue could be related to Helm, or also to the K8s go client, I'm not sure.

Metadata

Metadata

Labels

language/helmIssue is related to a Helm operator projectlifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions