Added auto-scale to the collector #856

jpkrohling · 2020-01-16T16:14:48Z

A Horizontal Pod Autoscaler (HPA) was added in this PR, along with a new MinReplicas and MaxReplicas. With that, the collector should now automatically scale up and down based on the CPU and/or memory consumption. When none of the new properties are specified, the minimum amount of replicas is 1, while the maximum number of replicas is 100. The HPA configuration is added only when the deployment strategy is either production or streaming.

Closes #848, even though the scaling of the storage isn't implemented by this one.

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

jpkrohling · 2020-01-16T16:16:40Z

This shows how the auto-scaling works in OpenShift. Note that this should also work in plain Kubernetes, but I'm not able to generate enough load on my local machine with minikube + ES (1
GiB) + tracegen.

jpkrohling · 2020-01-16T16:18:40Z

@kevinearls not sure we want to add a new e2e test for this, but perhaps you might have a good idea that wouldn't be too fragile?

jpkrohling · 2020-01-17T10:51:09Z

I just ran a longer test this morning, showing that it scales up and down. First, I deployed the simple-prod instance, along with tracegen (10 replicas). After about 20 minutes, 10 replicas of collector were available. Removing the tracegen deployment caused the number of replicas to go down back to 1 after about 20 minutes. Then, I changed simple-prod to add a maxReplicas set to 5. Then, I deployed tracegen again, and verified that the collector gets scaled only up to 5 replicas. Removing the tracegen causes the collector to eventually settle at 1 replica again.

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   1/1     1            1           2m2s
simple-prod-query       1/1     1            1           2m2s
tracegen                10/10   10           10          30s

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   10/10   10           10          19m
simple-prod-query       1/1     1            1           19m
tracegen                10/10   10           10          17m

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   1/1     1            1           44m
simple-prod-query       1/1     1            1           44m

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   5/5     5            5           59m
simple-prod-query       1/1     1            1           59m
tracegen                10/10   10           10          11m

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   5/5     5            5           71m
simple-prod-query       1/1     1            1           71m
tracegen                10/10   10           10          24m

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   5/5     5            5           71m
simple-prod-query       1/1     1            1           71m

$ kubectl get deployments
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
simple-prod-collector   1/1     1            1           88m
simple-prod-query       1/1     1            1           88m

And here's a set of screenshots from OpenShift (older events are at the bottom):

A Horizontal Pod Autoscaler (HPA) was added in this PR, along with a new MinReplicas and MaxReplicas. With that, the collector should now automatically scale up and down based on the CPU and/or memory consumption. When none of the new properties are specified, the minimum amount of replicas is 1, while the maximum number of replicas is 100. The HPA configuration is added only when the deployment strategy is either production or streaming. Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

objectiser

Looks good. Just some minor comments.

deploy/crds/jaegertracing.io_jaegers_crd.yaml

objectiser · 2020-01-17T16:30:11Z

deploy/examples/tracegen.yaml

+    spec:
+      containers:
+      - name: tracegen
+        image: jaegertracing/jaeger-tracegen:latest


Shouldn't tracegen image be versioned inline with the other jaeger components?

Yes, opened an issue (#866) to track this. The first image will probably be 1.17 (next release).

pkg/apis/jaegertracing/v1/jaeger_types.go

pkg/deployment/collector.go

objectiser · 2020-01-17T16:57:07Z

pkg/deployment/collector_test.go

@@ -21,7 +21,7 @@ func init() {

 func TestNegativeReplicas(t *testing.T) {
 	size := int32(-1)
-	jaeger := v1.NewJaeger(types.NamespacedName{Name: "TestNegativeReplicas"})
+	jaeger := v1.NewJaeger(types.NamespacedName{Name: "my-instance"})


What happened to the convention of naming instances after the test?

We started naming them all "my-instance" some time ago, as individual names don't bring much value and we had a few copy/paste mistakes.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

objectiser · 2020-01-20T12:29:14Z

@jpkrohling Approval subject to tests passing :)

jpkrohling · 2020-01-20T12:32:05Z

Local run has shown that a new permission is missing. I'm testing it locally and will update the PR once I confirm the tests are passing.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling requested a review from objectiser January 16, 2020 16:17

jpkrohling changed the title ~~Added auto-scale to the collector~~ WIP - Added auto-scale to the collector Jan 16, 2020

jpkrohling changed the title ~~WIP - Added auto-scale to the collector~~ Added auto-scale to the collector Jan 17, 2020

jpkrohling force-pushed the Autoscale branch from ab0e694 to 5040bbf Compare January 17, 2020 10:59

jpkrohling force-pushed the Autoscale branch from 5040bbf to fbce225 Compare January 17, 2020 12:56

objectiser reviewed Jan 17, 2020

View reviewed changes

jpkrohling mentioned this pull request Jan 20, 2020

Change tracegen example to use a versioned image #866

Closed

Fixes based on the review

113e25a

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

objectiser approved these changes Jan 20, 2020

View reviewed changes

Added permissions for hpa

996b035

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling merged commit 2e5969c into jaegertracing:master Jan 20, 2020

mumrau mentioned this pull request Feb 25, 2020

Automatically scale components based on metrics #428

Closed

pavolloffay mentioned this pull request Feb 21, 2022

Collector replicas count and HorizontalPodAutoscaler open-telemetry/opentelemetry-operator#729

Closed

kevinearls mentioned this pull request Jun 27, 2022

Update HorizontalPodAutoscaler code from v1 open-telemetry/opentelemetry-operator#943

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added auto-scale to the collector #856

Added auto-scale to the collector #856

jpkrohling commented Jan 16, 2020 •

edited

Loading

jpkrohling commented Jan 16, 2020

jpkrohling commented Jan 16, 2020

jpkrohling commented Jan 17, 2020

objectiser left a comment

objectiser Jan 17, 2020

jpkrohling Jan 20, 2020

objectiser Jan 17, 2020

jpkrohling Jan 20, 2020

objectiser commented Jan 20, 2020

jpkrohling commented Jan 20, 2020

Added auto-scale to the collector #856

Added auto-scale to the collector #856

Conversation

jpkrohling commented Jan 16, 2020 • edited Loading

jpkrohling commented Jan 16, 2020

jpkrohling commented Jan 16, 2020

jpkrohling commented Jan 17, 2020

objectiser left a comment

Choose a reason for hiding this comment

objectiser Jan 17, 2020

Choose a reason for hiding this comment

jpkrohling Jan 20, 2020

Choose a reason for hiding this comment

objectiser Jan 17, 2020

Choose a reason for hiding this comment

jpkrohling Jan 20, 2020

Choose a reason for hiding this comment

objectiser commented Jan 20, 2020

jpkrohling commented Jan 20, 2020

jpkrohling commented Jan 16, 2020 •

edited

Loading