Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client#List with client.InNamespace ListOption requires permissions at cluster-level #2608

Closed
jpkrohling opened this issue Feb 28, 2020 · 9 comments
Assignees
Labels
language/go Issue is related to a Go operator project triage/needs-information Indicates an issue needs more information in order to work on it.
Milestone

Comments

@jpkrohling
Copy link

jpkrohling commented Feb 28, 2020

Bug Report

When an operator has permissions for a single namespace, using the client provided by the manager makes the client to request objects from the cluster-level, causing the operator to hang indefinitely with messages like the following being shown every second:

E0228 10:28:38.957210       1 reflector.go:123] pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:observability:jaeger-operator" cannot list resource "secrets" in API group "" at the cluster scope

And here's the code that triggered it:

	opts := []client.ListOption{
		client.InNamespace(jaeger.Namespace),
		client.MatchingLabels(map[string]string{
			"app.kubernetes.io/instance":   jaeger.Name,
			"app.kubernetes.io/managed-by": "jaeger-operator",
		}),
	}
	list := &corev1.SecretList{}
	if err := r.client.List(ctx, list, opts...); err != nil {
		return tracing.HandleError(err, span)
	}

Source: https://github.com/jaegertracing/jaeger-operator/blob/c9147e7ec923113b2dbeafc917c8eb0070c7dc8e/pkg/controller/jaeger/secret.go#L21-L31

Note that namespace does contain a valid and existing namespace (observability).

As a workaround, I can bypass the cache by using mgr.GetAPIReader() instead of mgr.GetClient() for the list operations.

What did you expect to see?

What did you see instead? Under which circumstances?

  • The operator hands indefinitely, with an error message being shown every second, blocking my reconciliation loop

Environment

  • operator-sdk version: v0.15.1
  • go version: go1.13.4
  • Kubernetes version information: v1.17.3
  • Kubernetes cluster kind: minikube v1.7.3
  • Are you writing your operator in ansible, helm, or go? go

Possible Solution

  • Added a new field to the reconciler to hold mgr.GetAPIReader(), and used that whenever a #List operation is needed

Additional context
Causes: jaegertracing/jaeger-operator#935
Possibly (partially) related: kubernetes-sigs/controller-runtime#550

@jpkrohling
Copy link
Author

I'm opening this in this repository because I'm not quite sure where the problem lies. It might be in the way the operator configures the client, or perhaps it's it's in how the controller-runtime handles the cache.

@jpkrohling
Copy link
Author

This error isn't propagated to my operator, so I can't just print the stack there. It does not happen during the startup of my operator, but rather, during reconciliation.

@camilamacedo86
Copy link
Contributor

HI @jpkrohling,

Are you able to reproduce/create a POC with by using the Go Memcached Sample?

Because if the information is true :

When an operator has permissions for a single namespace, using the client provided by the manager makes the client to request objects from the cluster-level, causing the operator to hang indefinitely with messages like the following being shown every second:

Then, the same behaviour could be reproduced with. Am I right? Could you please provide the steps for we are able to reproduce it?

@camilamacedo86 camilamacedo86 added language/go Issue is related to a Go operator project triage/needs-information Indicates an issue needs more information in order to work on it. labels Mar 4, 2020
@jpkrohling
Copy link
Author

It should be easy to reproduce with the memcached example. I can't work on that right now, but I'll try to get a reproducer before the end of the month.

@joelanford
Copy link
Member

@jpkrohling I haven't tried reproducing this, but my first thought is that if the manager is setup to watch all namespaces, then (I think) it configures the client to do List at the cluster scope. If that's what's happening, this is a controller-runtime thing.

@estroz estroz added this to the Backlog milestone Apr 6, 2020
@joelanford
Copy link
Member

@jpkrohling Can you confirm if your operator is watching all namespaces when you encounter this issue?

If the controller-runtime's manager.Options.Namespace field is set to "" (i.e. all namespaces), then the controller-runtime cache will populate itself using informers that List and Watch at the cluster scope.

The controller-runtime client (likely the one you're using for this .List() call that causes the issue) performs reads (i.e. Get and List calls) against the cache. When the cache receives a Get or a List call for an object type that doesn't already exist in the cache, it starts up another informer for that type using the scope of the cache. So if the manager is cluster-scoped, the new informers created by the cache will be cluster-scoped, regardless of the ListOptions passed to the List call.

This issue is related as well: kubernetes-sigs/controller-runtime#244

If the above description of your scenario is correct, I don't think the SDK can help with this problem (other than helping solve this issue upstream).

Thoughts?

@jpkrohling
Copy link
Author

This does sound correct, just not 100% sure I faced this only when WATCH_NAMESPACES is empty. I need a couple of days (or a week) to verify this.

@jpkrohling
Copy link
Author

Just tried it out, and it does indeed happen only when the operator is watching all namespaces. Steps to reproduce:

  1. generate a new operator (tested with the latest as of now, v0.18.1)
  2. Add the code below to the reconcile logic
  3. Remove secrets from the role.yaml
  4. Change the role/rolebinding to clusterrole/clusterrolebinding
  5. Create a new CR
	opts := []client.ListOption{
		client.InNamespace("memcached"),
		client.MatchingLabels(map[string]string{
			"app.kubernetes.io/instance":   "my-instance",
			"app.kubernetes.io/managed-by": "app-operator",
		}),
	}
	list := &corev1.SecretList{}
	err := r.client.List(ctx, list, opts...)

Once the CR is applied, the following is seen in the logs and the reconciliation never finishes:

E0615 09:42:53.398396       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:default:app-operator" cannot list resource "secrets" in API group "" at the cluster scope

@joelanford joelanford assigned estroz and unassigned joelanford Jul 13, 2020
@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Aug 10, 2020

HI @jpkrohling,

I am closing this one since it is something that needs be addressed in the controller-runtime and was described already in the above comment #2608 (comment). Also, shows that you could confirm it as well.

However, please feel free to ping us if you think that it should be re-opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/go Issue is related to a Go operator project triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants