-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colocating metrics provider along with the operator causes HPA delays if not configured properly #5624
Comments
Willing to participate in the discussion and to contribute to code/documentation. |
Hello @bharathguvvala I think that the problem should be addressed by improving our docs about this and how to solve it. I think to address this from docs because updating the CR is part of the KEDA operation (not depending on the fallback) and it's part of the CRD, so users can build checking systems on top of it. Just giving some context about that change, the underlying controller added the limit rate support with a quite restrictive values (5 and 10 IIRC), we increased those values for the most common user type (< 100 ScaledObject) to prevent bursting API server by mistake. If you are willing to help with the documentation, it'd be awesome! Real user experiences are the best teachers for other folks |
@JorTurFer Whichever scaledobjects have not been configured with the fallback option , isn't it better to skip updating the fallback health status for every metrics read call, which would avoid redundant API calls to the K8s API Server? Instead this condition can be updated from the controller during scaledobject reconciliations? Also means that the fallback health stats are only updated for scaledobjects where the fallback is enabled and for the rest there are defaulted during the time of the scaledobject reconciliation. |
Regarding the documentation, I'll raise a PR in the next the couple of days which provides guidance to setup and configure KEDA on large clusters -- with high number of deployments. |
Your point it's interesting, it's true that updating the fallback all the time if the feature is disabled doesn't make sense. Checking the code, I have noticed that we can be updating the value although there isn't any change. Checking the fallback logic, we are callign to Lines 115 to 129 in f2d86a8
I think that we can improve that logic to reduce the calls to the API server. @zroubalik @dttung2905 @wozniakjan WDYT? |
That is a pretty good optimization, especially given there is already Line 116 in f2d86a8
and the check could be as simple as !reflect.DeepEqual()
Thank you @bharathguvvala, that would be terrific. Also, feel free to introduce code improvements, generally it's easier to get merged smaller PRs. |
@wozniakjan @JorTurFer I have made a change to disable health status updates if the fallback for the scaledobject is not configured. I will go ahead with adding the tests if this change is okayed in terms of the intent. We could do additional logic to avoid redundant updates where a fallback is configured in a scaledobject , on top of this. |
@JorTurFer @wozniakjan Taking a step back, I was thinking if this information around the error count needs to be updated back to the scaledobject status. Since this is transient information used to implement some sort of circuit breaking isn't it appropriate to keep this information inside the operator (in memory) and only update the condition whenever it flips? I presume this information is updated in the status only to make it persistent and survive across multiple operator restarts but it's also expensive considering that an update is performed in the read path of the GetMetrics and can potentially affect the GetMetrics latencies which in turn can affect the autoscaler SLOs. If the same error count information can be reconstructed from scratch based on the new set of errors then why not avoid persistenting it? |
What do you mean? how would you reconstruct it? It's an interesting point |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Report
At our organization (Flipkart), we manage large k8s clusters where atleast 300 scaledobjects are present. Lately when we onboarded a bunch of new workloads to be managed by scaledobjects we saw substantial degradation to the HPA functionality where scaling events weren't happening inline with changes to corresponding compute metrics.
A bit of a background here -- recently we upgraded k8s from 1.22.x to 1.25.x and keda from 2.2.x to 2.10.x in order to migrate to HPA v2 as well as move to a newer keda to bring in all the goodness of an upgrade. Post this upgrade everything was fine and the scaledobject count at a cluster level is around 200+ . As mentioned above, when we onboarded a bunch of new scaledobjects (50+ in count) we started seeing substantial degradation to the autoscaling pipeline which impacted all the workloads since they were not scaled out/down in time. Based on our observations it took >10mins in general.
Upon troubleshooting and profiling a bunch of components such as controller manager, k8s metrics server, keda metrics adapter and operator, we saw that the external metrics served by the keda metrics adapter was latent and the p99 latencies were around ~5s. Following our investigation after profiling and code walkthrough we realized that the metrics scraping model has changed in one of the recent releases of keda where the metrics are fetched from the operator (via grpc client) as opposed to the earlier design where the metrics are fetched from the metrics adapter. We saw that in this flow, there are calls being made to the K8s API server to update a certain fallback health status of the scaledobjects. These calls were getting rate limited at the client side causing higher read latencies for the external metrics API. We realized that the k8s clients used by the reconciliation controllers and the metrics scraper are same and come under the default rate limits. Upon increasing these rate limits from the default (20 QPS and 30 burst) the issue is resolved.
While we weren't aware of this design change which sort of merged the scraping component into the operator, we were confused as to why there are k8s API update calls being made as part of the metrics read path. We feel a couple of changes can be made to make keda users aware of deploying and supporting large scale clusters:
Expected Behavior
Actual Behavior
Delays in HPA pipelines
Steps to Reproduce the Problem
Deploy with default configuration and create 300+ scaledobjects.
KEDA Version
2.10.1
Kubernetes Version
< 1.26
Platform
Other
Scaler Details
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: