Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

zroubalik · 2022-11-29T09:29:47Z

Proposal

Currently we are exposing Prometheus Metrics on Scaling and Errors in Metrics adapter:
https://keda.sh/docs/2.8/operate/prometheus/#metrics-adapter

We also expose some metrics in Operator and also there is a new feature to expose additional metrics in Operator:
#3663

Old situation

### Metrics Server
keda_metrics_adapter_scaler_error_totals 
keda_metrics_adapter_scaled_object_error_totals
keda_metrics_adapter_scaler_errors
keda_metrics_adapter_scaler_metrics_value

### Operator
operator-sdk related metircs
# newly introduced:
keda_operator_resource_totals 
keda_operator_trigger_totals

New situation

Opeator is now the main source of metrics, keeping Metrics Server metrics as Deprecated for a some time, till #3930 is resolved
Some metric names were renamed.

### Operator
keda_scaler_error_totals 
keda_scaled_object_error_totals
keda_scaler_errors
keda_scaler_metrics_value
keda_resource_totals 
keda_trigger_totals

operator-sdk related metircs


### Metrics Server -- DEPRECATED!
keda_metrics_adapter_scaler_error_totals 
keda_metrics_adapter_scaled_object_error_totals
keda_metrics_adapter_scaler_errors
keda_metrics_adapter_scaler_metrics_value

This will help us in the future with multi-tenant story etc.

Related:
#3930

The text was updated successfully, but these errors were encountered:

zroubalik · 2022-11-29T09:45:12Z

@kedacore/keda-maintainers @v-shenoy PTAL^

v-shenoy · 2022-11-29T09:50:04Z

I agree with this proposal. Having them all in one place makes sense, and the operator feels like the right place.

Considering the next release is going to be a breaking one anyway with the HPA API versions, might as well get any other breaking changes done along with it.

tomkerkhove · 2022-11-29T09:51:01Z

This will be a breaking change, but I think it is beneficial and we should do this change now. It will help us in the future with multi-tenant story etc.

Sure but it's breaking :) So we can add them on the operator with the new names, but we cannot remove them on the metric server.

tomkerkhove · 2022-11-29T09:51:24Z

Considering the next release is going to be a breaking one anyway with the HPA API versions, might as well get any other breaking changes done along with it.

I tend to disagree because this is not pushed on us by Kubernetes

v-shenoy · 2022-11-29T09:53:49Z

This will be a breaking change, but I think it is beneficial and we should do this change now. It will help us in the future with multi-tenant story etc.

Sure but it's breaking :) So we can add them on the operator with the new names, but we cannot remove them on the metric server.

Wouldn't renaming the ones on the operator (except the ones not released) also be breaking? Also, this is obviously a maintainer's call on what breaking changes should / shouldn't be done. So, I leave that to all of you.

tomkerkhove · 2022-11-29T09:54:58Z

Based on this:

My suggestion is to move all these metrics to Operator and slightly rename them:

It looks like metrics in a new place so we can do anything we want :) But the ones we have today must be kept around.

zroubalik · 2022-11-29T09:55:55Z

Honestly it would be hard to keep those on Metrics Server in a sensible way. And especially if we try to tackle the semi-multitenancy.

v-shenoy · 2022-11-29T09:56:07Z

Based on this:

My suggestion is to move all these metrics to Operator and slightly rename them:

It looks like metrics in a new place so we can do anything we want :) But the ones we have today must be kept around.

Right, also I forgot the operator currently doesn't expose anything other than the default OperatorSDK stuff.

zroubalik · 2022-11-29T09:57:18Z

But this #3861 is changing the way we are getting metrics (and exposing them as Prom metrics). It is no longer happening on Metrics Server, it is on Operator.

JorTurFer · 2022-11-29T09:57:24Z

The problem is that currently we are goign to break the metrics server metrics because we are going to move the logic to generate them moving the metric querying to the operator. I think that we could reduce the impact if we add the scrapping to the operator as part of helm chart, but the change is there and we need it

zroubalik · 2022-11-29T09:59:03Z

IMHO we are not doing a breaking change per se. We are still exposing the metrics, but only from a different endpoint with a slightly different name. I am okay with doing this, if this change is properly documented.

v-shenoy · 2022-11-29T09:59:38Z

But this #3861 is changing the way we are getting metrics (and exposing them as Prom metrics). It is no longer happening on Metrics Server, it is on Operator.

So, if I understand this right, basically the operator will be doing all of the heavy lifting going forward. Not just reconciling the CRDs, but also fetching the metrics from the HPA, and caching them. And the metrics server will just query the operator?

JorTurFer · 2022-11-29T09:59:41Z

I think this release as is already "disruptive" due to the support removal is a good release to do it if we document it clearly

zroubalik · 2022-11-29T10:00:13Z

But this #3861 is changing the way we are getting metrics (and exposing them as Prom metrics). It is no longer happening on Metrics Server, it is on Operator.

So, if I understand this right, basically the operator will be doing all of the heavy lifting going forward. Not just reconciling the CRDs, but also fetching the metrics from the HPA, and caching them. And the metrics server will just query the operator?

Yes, we are also reducing the number of opened connections from KEDA.

tomkerkhove · 2022-11-29T10:01:46Z

I think this release as is already "disruptive" due to the support removal is a good release to do it if we document it clearly

That is a different way of breaking given this is because of Kubernetes. Moving things to another endpoint and renaming them is going to break KEDA's operational story which is bad.

This is also not in line with the deprecation policy that we have discussed on kedacore/governance#70.

Active KEDA users will have to manually change to the new endpoint and use the new endpoints which means this is a breaking change.

JorTurFer · 2022-11-29T10:04:06Z

Active KEDA users will have to manually change to the new endpoint and use the new endpoints which means this is a breaking change.

We could cover this updating the chart, reducing the impact as much as possible. I think that helm is the most used tool for deploying KEDA, so if we do the needed changes there, we can make this change transparent to end-users

tomkerkhove · 2022-11-29T10:06:41Z

And yet, not everyone uses Helm so it's a breaking change.

zroubalik · 2022-11-29T10:11:28Z

Well, technically the deprecation policy is not yet valid :)

I really think this change is in users favour, consolidating the metrics in one place is a good thing and really, if we want to have some sort of multitenancy, we would have to do this change anyway. So my call is to do the change now, together with other breaking changes that we are introducing in 2.9.

JorTurFer · 2022-11-29T10:14:00Z

Technically, we are not introducing breaking changes in the operator as those metrics aren't released yet, they are only in main. WDYT if we add an endpoint in the metrics server that request the metrics to operator?
I know this won't scale and won't support multitenant, but in that case, we can enforce the change as this metrics-server endpoint must be deprecated in this version and deprecated features don't receive updates.
I also know that it's ugly and design pov, but this could avoid the breaking change, allowing us the moving...

JorTurFer · 2022-11-29T10:15:53Z

WRT metric names, we can expose both in the operator, the old and new metrics, deprecating the old metric names.

zroubalik · 2022-11-29T10:17:46Z

Are you up for implementing this? :) I was thinking about this originally, but dropped the idea for it being a messy solution.

zroubalik · 2022-11-29T10:23:06Z

And it will block us of introducting the multi tenant solution till we fully drop metrics from metrics server. According to the gov policy in preparation it would mean a year.

JorTurFer · 2022-11-29T10:31:13Z

yes and no, We need to keep that metrics available a year, truth. But as deprecated feature, we could say:
Do you want multitenant? Metrics exposed by metrics server are deprecated and won't support it, migrate to the new one to have them working with multitenant

So basically, we will only expose metrics from the same namespace as metrics server from metrics server endpoint.

JorTurFer · 2022-11-29T10:34:20Z

I'm not proposing to have a multitenant solution exposing the metrics from metrics server at all, my proposal is just, go ahead with new metrics approach and "proxy" the old metric endpoint to the operator in the same namespace without multitenant support

tomkerkhove · 2022-11-29T10:35:03Z

For me this is fairly clear on what we should do:

Introduce the approach suggested above by @zroubalik as new "ideal" solution
Introduce feature flag on metric server to still serve our current metrics (*)
Ship with feature flag on by default but officially deprecating it
Next KEDA v2.11 has the flag turned off by default

Existing end-users are not broken but are aware things are moving, and they can start migrating. In 2 releases we still give them the option to do so, but give them a nudge to move because it's off by default.

Am I missing something?

(*) Scrape metrics from operator and serve in metric server, we can add a small memcache here to reduce load on operator.

v-shenoy · 2022-11-29T10:39:15Z

For me this is fairly clear on what we should do:

Introduce the approach suggested above by @zroubalik as new "ideal" solution

Introduce feature flag on metric server to still serve our current metrics (*)

Ship with feature flag on by default but officially deprecating it

Next KEDA v2.11 has the flag turned off by default

Existing end-users are not broken but are aware things are moving, and they can start migrating. In 2 releases we still give them the option to do so, but give them a nudge to move because it's off by default.

Am I missing something?

(*) Scrape metrics from operator and serve in metric server, we can add a small memcache here to reduce load on operator.

I am not sure if this is a dumb question. But why specifically v2.11?

JorTurFer · 2022-11-29T10:40:35Z

I am not sure if this is a dumb question. But why specifically v2.11?

In progress deprecation policy says that deprecated features are removed in 4 releases (1 year), so 2.11 is just half the deprecation time

tomkerkhove · 2022-11-29T10:42:13Z

I went with current + 2 which is 2.11, this should be OK to do as it's still around but we might want to clarify feature flags in the process.

tomkerkhove · 2022-11-29T10:48:17Z

For me this is fairly clear on what we should do:

Introduce the approach suggested above by @zroubalik as new "ideal" solution

Introduce feature flag on metric server to still serve our current metrics (*)

Ship with feature flag on by default but officially deprecating it

Next KEDA v2.11 has the flag turned off by default

Existing end-users are not broken but are aware things are moving, and they can start migrating. In 2 releases we still give them the option to do so, but give them a nudge to move because it's off by default.

Am I missing something?

(*) Scrape metrics from operator and serve in metric server, we can add a small memcache here to reduce load on operator.

Proposal also added to policy: https://github.com/kedacore/governance/pull/70/files#r1034591287

zroubalik · 2022-11-29T14:57:23Z

I have updated the issue description. We will introduce the new metrics and consolide old ones in KEDA Operator, while keeping the old metrics in Metrics Server, which is deprecated option and will be removed in future KEDA releases.

Fixes #3920 Fixes #3919

tomkerkhove · 2022-11-30T08:01:19Z

Re-opening as we need to announce deprecation as well

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

tomkerkhove · 2022-12-06T14:15:29Z

@zroubalik Can you please announce the deprecation as per https://github.com/kedacore/governance/blob/main/DEPRECATIONS.md?

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

tomkerkhove · 2022-12-07T08:44:45Z

Deprecation issue is #3972 with conversation on #3973

zroubalik added needs-discussion feature-request All issues for new features that have not been committed to labels Nov 29, 2022

zroubalik mentioned this issue Nov 29, 2022

Cache metrics (values) in Metric Server and honor pollingInterval #2282

Closed

zroubalik self-assigned this Nov 29, 2022

zroubalik mentioned this issue Nov 29, 2022

Metrics Server: use gRPC connection to get metrics from Operator #3861

Merged

5 tasks

zroubalik changed the title ~~Consolidate exposed Prometheus Metrics in one place~~ Consolidate all exposed Prometheus Metrics in KEDA Operator Nov 29, 2022

zroubalik mentioned this issue Nov 29, 2022

Prometheus Metrics exposed from operator kedacore/keda-docs#989

Merged

1 task

JorTurFer closed this as completed in #3861 Nov 29, 2022

JorTurFer pushed a commit that referenced this issue Nov 29, 2022

Metrics Server: use gRPC connection to get metrics from Operator (#3861)

2492a43

Fixes #3920 Fixes #3919

tomkerkhove reopened this Nov 30, 2022

josephangbc pushed a commit to josephangbc/keda that referenced this issue Dec 6, 2022

Metrics Server: use gRPC connection to get metrics from Operator (ked…

1819e83

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

josephangbc pushed a commit to josephangbc/keda that referenced this issue Dec 6, 2022

Metrics Server: use gRPC connection to get metrics from Operator (ked…

18f3e7f

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

This was referenced Dec 7, 2022

Release: v2.9.0 #3968

Closed

Prometheus metrics on KEDA Metric Server are deprecated in favor of Prometheus metrics on KEDA Operator #3972

Closed

tomkerkhove mentioned this issue Dec 7, 2022

docs(changelog): Introduce breaking changes overview for v2.9 in release notes #3974

Merged

2 tasks

tomkerkhove closed this as completed Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

zroubalik commented Nov 29, 2022 •

edited

Loading

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022 •

edited

Loading

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 •

edited

Loading

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022

v-shenoy commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

zroubalik commented Nov 29, 2022

tomkerkhove commented Nov 30, 2022

tomkerkhove commented Dec 6, 2022

tomkerkhove commented Dec 7, 2022

Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

Comments

zroubalik commented Nov 29, 2022 • edited Loading

Proposal

Old situation

New situation

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 • edited Loading

tomkerkhove commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 • edited Loading

tomkerkhove commented Nov 29, 2022 • edited Loading

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022 • edited Loading

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022

v-shenoy commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022 • edited Loading

tomkerkhove commented Nov 29, 2022 • edited Loading

JorTurFer commented Nov 29, 2022 • edited Loading

tomkerkhove commented Nov 29, 2022

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022 • edited Loading

JorTurFer commented Nov 29, 2022

zroubalik commented Nov 29, 2022

zroubalik commented Nov 29, 2022

JorTurFer commented Nov 29, 2022 • edited Loading

JorTurFer commented Nov 29, 2022 • edited Loading

tomkerkhove commented Nov 29, 2022

v-shenoy commented Nov 29, 2022

JorTurFer commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

tomkerkhove commented Nov 29, 2022

zroubalik commented Nov 29, 2022

tomkerkhove commented Nov 30, 2022

tomkerkhove commented Dec 6, 2022

tomkerkhove commented Dec 7, 2022

zroubalik commented Nov 29, 2022 •

edited

Loading

v-shenoy commented Nov 29, 2022 •

edited

Loading

v-shenoy commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022 •

edited

Loading

v-shenoy commented Nov 29, 2022 •

edited

Loading

zroubalik commented Nov 29, 2022 •

edited

Loading

tomkerkhove commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading

JorTurFer commented Nov 29, 2022 •

edited

Loading