Skip to content

Auto-scale collector instances #848

Closed

Description

Based on an (old) Gitter message from @black-adder, it would be good to let the Jaeger Operator automatically scale up/down Jaeger:

As a rule of thumb, if you see that spans are being dropped, I'd go ahead and add more jaeger-collector hosts. If that doesn't mitigate, then scale out your cassandra cluster.

@objectiser then shared that:

We should look to add some autonomic behaviour to the management of jaeger, by monitoring relevant metrics to understand when reporting/storing trace data is resulting in issues and take action (e.g. scale up).

Possibly an action can initially try scaling up the collector, but if that does not change the no. of dropped spans significantly after a specified time, then it tries to scale up the storage.

Trickier situation would be how to determine when to scale down. That may be a combination of jaeger metrics with possibly some other factors (e.g. cpu utilisation, etc) - but only scaling down to a predefined minimum config.

With today's knowledge and tools, I think we can start with a simple approach of just using the Horizontal Pod Autoscaler (HPA) from Kubernetes, to scale up/down based on CPU and memory. The idea is that when there's a shortage of workers, the CPU will be close to the limit, and when the queues are full, the memory will be close to the limit as well (once jaegertracing/jaeger#943 is closed).

To me, the only piece of infra that should be scaled for now is the collector. The query isn't typically heavily used to the point of requiring dynamic scaling, and agent's are either scaled with the application (sidecar), or can't be scaled at all (daemon sets). The only other remaining component is the ingester, which could be dealt with in a second phase.

This leaves scaling of the storage out of the equation for now: we could either add them in a second phase, or delegate this action to the storage's operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions