You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While load testing, the HPA scales the collector statefulset up to 12 pods. After lowering metric workload for testing, HPA scales back down to a single collector pod in the statefulset.
What I expected:
I expect the target allocator to assign targets to only remaining pod after scale down
What actually happened:
collector-0 does not have targets assigned in the TA, while collector-11 has the target I expected. Collector-11 was terminated and therefore should not have any targets.
$ kubectl get po -n opentelemetry
NAME READY STATUS RESTARTS AGE
curl-moh 1/1 Running 0 115m
lightstep-collector-collector-0 1/1 Running 0 57m
lightstep-collector-targetallocator-b6865b5bb-dc4w5 1/1 Running 0 113m
opentelemetry-operator-controller-manager-575cdcbc57-4d24t 2/2 Running 0 11h
[root@curl-moh:/]$ curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor%2Favalanche%2Favalanche%2F0/targets?collector_id=lightstep-collector-collector-0
[]
[ root@curl-moh:/ ]$ curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor%2Favalanche%2Favalanche%2F0/targets?collector_id=lightstep-collector-collector-11
[
{
"targets": [
"10.0.7.184:9001"
],
"labels": {...}
}
]
I wonder if this has to do with HPA stabilization window. It seems that target allocator is reallocating targets when the targets change. But stabilization window implies it will take several minutes for the unneeded pods to be terminated. Therefore the allocator will see the reduction in targets and reassign targets to the collectors (even though HPA is about to scale down). If there is no change in targets after collector scale down is complete, this results in a terminated collector pod being assigned targets.
The text was updated successfully, but these errors were encountered:
After adding some logs found more info as to why this bug is occurring.
{"level":"info","ts":1665127123.2009046,"logger":"allocator","msg":"The number of collectors to be added is: 0"}
{"level":"info","ts":1665127123.2011936,"logger":"allocator","msg":"The number of collectors to be removed is: 0"}
{"level":"info","ts":1665127123.439969,"logger":"allocator","msg":"The number of collectors to be added is: 0"}
{"level":"info","ts":1665127123.440088,"logger":"allocator","msg":"The number of collectors to be removed is: 0"}
{"level":"info","ts":1665127127.2063422,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665127338.1837273,"logger":"allocator","msg":"false","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1665127338.1838017,"logger":"allocator","msg":"Collector pod watch event stopped no event","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1665127427.2042353,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665127437.2033138,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665127767.2002187,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665127797.1970944,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665128392.1984103,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665128397.1981525,"logger":"allocator","msg":"targets handled successfully"}
{"level":"info","ts":1665128402.1978114,"logger":"allocator","msg":"targets handled successfully"}
Pod watcher is stopped at some point on scale down and after that happens, only SetTargets is ever called and no calls to SetCollectors is made. This means an inaccurate representation of collectors is used by the TA.
This bug has been difficult to reproduce reliably, which is making testing difficult, but the issue seems to be related to this line.
Observed an issue while load testing with the HPA created from the collector CRD.
Context:
in my collector spec:
While load testing, the HPA scales the collector statefulset up to 12 pods. After lowering metric workload for testing, HPA scales back down to a single collector pod in the statefulset.
What I expected:
I expect the target allocator to assign targets to only remaining pod after scale down
What actually happened:
collector-0 does not have targets assigned in the TA, while collector-11 has the target I expected. Collector-11 was terminated and therefore should not have any targets.
I wonder if this has to do with HPA stabilization window. It seems that target allocator is reallocating targets when the targets change. But stabilization window implies it will take several minutes for the unneeded pods to be terminated. Therefore the allocator will see the reduction in targets and reassign targets to the collectors (even though HPA is about to scale down). If there is no change in targets after collector scale down is complete, this results in a terminated collector pod being assigned targets.
The text was updated successfully, but these errors were encountered: