-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional exported metrics for the metrics apiserver #4460
Comments
hi @BojanZelic |
I was going to implement it for the api server only; Because the operator already exposes prometheus metrics for reconciliation time, errors, # of reconciliations, ect...
But the api server lacks observability for ensuring correct responses & response times; |
The metrics server should have also the same metrics because it's built on top of the operator framework |
It looks like operator framework provided metrics exist for the keda api server under port which is great for figuring out if the But what I want to achieve is wether or not the request is succeeding or failing from the perspective of the HPA controller
& figure out what percentage of those requests fail; so the concept doesn't apply to the operator? unless you mean the grpc communication between the metrics server & the operator, in that case, yes I could add those same metrics there |
Okey... it's complicated... From operator pov, I guess that we can expose similar metrics based on the gRPC request, WDYT? |
I plan to remove them during this week, so if you wait until that, the metrics server will have only one Prometheus server, making the things easier xD |
makes sense! thanks for the explanation. The new metrics can be added to as for grpc metrics, it seems like we're using |
@BojanZelic fyi, the deprecated prometheus server from Metrics Server has been removed. I don't follow the part about gRPC, what kind of stuff do you want to expose there? Thanks! |
We have removed the second Prometheus server but they can still add those metrics to the operator-runtime prometheus (exposed by port 8080 in the metrics server). |
So an update after some digging I discovered that there's actually another metric endpoint not documented that exposes these metrics already without a code change. It needs to be accessed from an allowed role & I updated the docs here: kedacore/keda-docs#1222 I ended up not needing the GRPC metrics as they weren't useful and essentially would report similar latency metrics as I tried to combine the metrics so that they're exposed via the controller-runtime port (8080) and endpoint but was unsuccessful since:
and there's no easy way to combine the 2 since they both alter global variables, use |
Nice research!!! Exposing 2 metrics server doesn't make sense IMHO and we should choose one. If there isn't any way to merge them, maybe we have to open issues on the upstream to allow this somehow. @zroubalik ? |
Got it to work under the existing What do you think? |
@JorTurFer I created a PR to consolidate the metrics to one port; PTAL #4982 |
Proposal
Provide similar exported metrics that the Kubernetes metrics-server has that are useful for tracking SLOs for KEDA (ex latency, or % of successful requests, ect...)
ex:
Use-Case
I would like to get the number of requests that have failed to keda api server; currently this isn't possible, with metrics-server, I can do something like:
This is because
apiserver_request_total{group="metrics.k8s.io"}
is exposed by the metrics-server and metrics-server uses theapiserver
package from kubernetesKEDA could expose similar metrics either by using the
apiserver
package from Kubernetes or implementing something similar;Is this a feature you are interested in implementing yourself?
Yes
Anything else?
the aggregation api from kubernetes tracks failed requests to
external.metrics.k8s.io
viaapiserver_request_terminations_total
; it does not track the total # of requests though; There's currently no way to get the total # of requests toexternal.metrics.k8s.io
since the keda metrics apiserver doesn't expose this information;The text was updated successfully, but these errors were encountered: