-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric names become incorrect if any of ScaledObject's triggers is unavailable #2592
Comments
@rwkarg do you see those incorrect metric names in metrics server? What metric names do you see in the |
The ScaledObject has the same incorrect list of metric names (duplicate The metrics server has the expected metrics (single instance of |
Also noted that when this happens, the duplicate is always the last index |
Maybe found something? Maybe this is because there is both a message and rate based trigger on the same queue, but I don't understand exactly how there's a duplicate with the |
I think that the By chance, do you see any other errors? One of the rabbit instances might have some temporary issues, causing that scaler to being able not connect there? Anything special about your ScaledObjects config? |
I have probably found the bug: #2593 and we will most like release 2.6.1 release next week with this fix. @rwkarg it would be great if you can check that fix on one of your setups, of course if there's a possibility to do so. Thanks! |
Testing out |
Fix is looking good. All HPAs across all clusters still have the correct metric names after the weekend. |
@rwkarg excellent, thanks for the message! |
Report
After some amount of time of successful operation, the HPA reports that some of the metrics it is looking for are not available. This is because the name of the metric in the HPA has been changed to something that appears to be incorrect. Restarting the operator and metrics-apiserver will temporarily correct the metric names in the HPA.
Initial report from Slack: https://kubernetes.slack.com/archives/C01JGDP8MB8/p1643049613000900
I first observed this with 2.5.0 and it is still occurring with 2.6.0. Prior to 2.5.0 we were running 2.2.0 so I don't know about the versions in between.
Initial list of metrics in HPA when it's working (note the s
n
- prefix is all unique numbers strictly increasing):That same HPA later stops scaling appropriate (scales out to max instances) and reports the following metric names (note the duplicate s7- prefixes and the absence of an s1- prefix. The
s7-rabbitmq-Europa-OnlinePlayerStatus-processUpdateIssueOnlineStatusRequest
metric is reported as being unavailable which makes sense since it is actuallys1-rabbit...
on the metrics server)Expected Behavior
The metric names written to the HPA should match the metrics being sent to the metrics server.
Actual Behavior
The metric names deviate from what is being sent to the metrics server.
Steps to Reproduce the Problem
I'm running KEDA on 12 identical clusters right now and most of them don't run in to this issue. For those that do, it can be as little as a few hours to a few days before this shows up.
Logs from KEDA operator
KEDA Version
2.6.0
Kubernetes Version
1.21
Platform
Google Cloud
Scaler Details
RabbitMQ
Anything else?
scalerIndex
is pulled from thisfor
loop which is where the number for the sn
- prefix of metric names is pulled from:keda/pkg/scaling/scale_handler.go
Line 285 in e44a31e
Given that, I'm not clear on how, in the above example, there is no
s1-
prefix and there are duplicates7-
prefixes.Initially there was thought that this might be related to #2407 which was implemented in 2.6.0, but after upgrading to 2.6.0, this is still occurring.
The text was updated successfully, but these errors were encountered: