Description
What happened?
I utilize two customized metrics, A and B, in my HPA system. A is a gauge-based metric called SLA Metric, while B is a count-based metric that tracks failed requests with HTTP status code 502 or 503 from Istio. These metrics are scraped by Prometheus.
To use custom metrics in HPA, we're employing Kube Metrics Adapter link. When the application load increases, the value of the SLA Metric also increases, and then the pods scale up until they reach the maximum replica count as expected.
However, the problem arises when the load dissipates, and the pods never scale down. Despite the SLA Metric's value being below the target in Prometheus, the HPA description still displays the metric value with a stale value that can be above or below the target.
One possible reason for this is that Metric B, which relies on Istio requests, shows up as unknown since there have been no failed requests with the 502 or 503 status codes. Thus, the Prometheus query fails.
We have noticed this behavior after upgrading Kube from version 1.21 to 1.24 , changing the HPA version from autoscaling/v2beta2 to autoscaling/v2, and changing the kube-metrics-adapter version from v0.1.16 to v0.1.19.
kubectl describe hpa my-hpa
Name: my-hpa
Namespace: namespace
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: my-pod
meta.helm.sh/release-namespace: default
metric-config.object.avg-sla-breach.prometheus/query:
avg(
avg_over_time(
is_sla_breach{
app="my-pod",
canary="false"
}[10m]
)
)
metric-config.object.istio-requests-total.prometheus/per-replica: true
metric-config.object.istio-requests-total.prometheus/query:
sum(
rate(
istio_requests_total{
response_code=~"502|503",
destination_service="my-pod.namespace.svc.cluster.local"
}[1m]
)
) /
count(
count(
container_memory_usage_bytes{
namespace="namespace",
pod=~"my-pod.*"
}
) by (pod)
)
CreationTimestamp: Wed, 12 Jul 2023 17:52:21 +0530
Reference: Deployment/my-pod
Metrics: ( current / target )
"istio-requests-total" on Pod/my-pod (target value): <unknown> / 200m
"avg-sla-breach" on Pod/my-pod (target value): 833m / 500m
Min replicas: 1
Max replicas: 3
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetObjectMetric the HPA was unable to compute the replica count: unable to get metric istio-requests-total: Pod on namespace my-pod/unable to fetch metrics from custom metrics API: the server could not find the metric istio-requests-total for pods my-pod
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetObjectMetric 2m14s (x140768 over 25d) horizontal-pod-autoscaler unable to get metric istio-requests-total: Pod on namespace my-pod/unable to fetch metrics from custom metrics API: the server could not find the metric istio-requests-total for pods my-pod
To troubleshoot this further we checked the metrics value using
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/pods/my-pod/avg-sla-breach"
Output:
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/pods/my-pod/avg-sla-breach"},"items":[{"describedObject":{"kind":"Pod","namespace":"my-namespace","name":"my-pod","apiVersion":"v1"},"metricName":"avg-sla-breach","timestamp":"2023-08-07T08:14:35Z","value":"0","selector":null}]}
Although the metric value appears as zero, we can observe that the HPA description displays a stagnant value.
Workaround :
HPA behaves as expected when the second metric B is completely removed or modified to return 0 when the query fails.
What did you expect to happen?
HPA should scale down properly based on one of the metrics, even when the other metric value is not available.
How can we reproduce it (as minimally and precisely as possible)?
- Set up the Kube metrics adapter link .
- Create a custom-metric-based HPA that uses two metrics among which the value of one is undefined.
- Increase the load i.e., the value of the other metric so that the HPA kicks in and scales up the pods to the max replica count.
- Reduce the load i.e. the value of the metric, it will be stuck at a random value.
Anything else we need to know?
Does anyone faced similar issues in hpa or is this behavior of hpa for multiple metrics changed recently, especially in scaling down events? Can anyone from the community look into the issue and give some clarity?