Skip to content
This repository has been archived by the owner on Aug 12, 2024. It is now read-only.

Introduce framework to allow emitting promethues metrics from the plain provisioner #266

Open
anik120 opened this issue Apr 26, 2022 · 7 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@anik120
Copy link
Contributor

anik120 commented Apr 26, 2022

No description provided.

@timflannagan timflannagan added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 26, 2022
@timflannagan timflannagan added this to the backlog milestone Apr 26, 2022
@joelanford
Copy link
Member

Controller-runtime already has a metrics server built in, so I'd push back a little on the need for an entire framework (since I think we already have most of one).

Also, I'm going to make yet another plug for kubebuilder/operator-sdk, which has a bunch of niceties around metrics (e.g. scaffolding that includes kube-rbac-proxy to protect the metrics endpoint and a ServiceMonitor object that can be used to automatically configure a prometheus-operator controlled prometheus instance to scrape our metrics)

@anik120
Copy link
Contributor Author

anik120 commented Apr 28, 2022

@joelanford if (read when) we want these metrics to be accessible outside the cluster via a Service, so that we can collect metrics from all the clusters running Bundles, I was under the impression that it's actually a security risk to expose the default port that the controller uses (which is where the controller-runtime metrics are published i.e /metrics). Which, I'm guessing, is probably why kubebuilder/sdk projects include kube-rbac-proxy.

Also, I'm going to make yet another plug for kubebuilder/operator-sdk

I feel like I'm missing some context here, but why wasn't any of kubebuilder/operator-sdk used again? Use cases like "automatically configure a prometheus-operator controlled prometheus instance to scrape our metrics" is going to come up when we go live in prod, and if we were to learn from OLM, we kept manually trying to keep up with these things, and therefore kept missing edge cases/introducing bugs etc.

Metrics is just one component, I'm guessing there will most definitely be other components that'll bring up the same discussion.

cc: @timflannagan @tylerslaton @exdx, in case you guys have the context I'm missing.

@anik120
Copy link
Contributor Author

anik120 commented Apr 29, 2022

From the Kubebuilder metrics guide:

By default, controller-runtime builds a global prometheus registry and publishes a collection of performance metrics for each controller. These metrics are protected by kube-rbac-proxy by default if using kubebuilder. Kubebuilder v2.2.0+ scaffold a clusterrole which can be found at config/rbac/auth_proxy_client_clusterrole.yaml.

From the kube-rbac-proxy project:

Motivation

I developed this proxy in order to be able to protect Prometheus metrics endpoints. In a scenario, where an attacker might obtain full control over a Pod, that attacker would have the ability to discover a lot of information about the workload as well as the current load of the respective workload. This information could originate for example from the node-exporter and kube-state-metrics. Both of those metric sources can commonly be found in Prometheus monitoring stacks on Kubernetes.

This project was created to specifically solve the above problem, however, I felt there is a larger need for such a proxy in general.

@timflannagan
Copy link
Contributor

Brief note from the upstream OLM working group: let's work towards a model where we invest in metrics sooner-rather-than-later. It sounded like we agreed that c-r metrics should be sufficient in the short term, assuming we back those metrics behind some RBAC/authz/etc. (e.g. kube-rbac-proxy). We can invest further in determine which metrics are needed a couple of metrics down the line once we iron out the API surface for the core rukpak APIs, and the behavior and functionality present in the plain provisioner implementation.

@github-actions
Copy link

github-actions bot commented Sep 7, 2022

This issue has become stale because it has been open 60 days with no activity. The maintainers of this repo will remove this label during issue triage. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2022
@tylerslaton
Copy link
Contributor

Now that some of the more focused on features have gotten into the RukPak project we should consider coming back to this. @anik120 was kind enough to open this issue as well as create a PR for it which is linked above. To get this work over the finish line we should re-evaulate that closed PR (which was closed mainly for age), rebase it, and think of what is needed to get it merged.

Brief note from the upstream OLM working group: let's work towards a model where we invest in metrics sooner-rather-than-later.

This should help us to focus on this point moving forward with the Deppy project as well since we can leverage the work done here there.

@tylerslaton tylerslaton added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 13, 2022
@github-actions
Copy link

This issue has become stale because it has been open 60 days with no activity. The maintainers of this repo will remove this label during issue triage or it will be removed automatically after an update. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants