Task Manager health - average interval

## Context

To auto-scale Kibana, the following rules will determine when Kibana scales-up and scales-down:

**Scale-up**: Task load > 45% of Task Capacity OR 90th percentile CPU > 0.85
**Scale-down**: Task load < 15% of Task Capacity AND 90th percentile CPU < 0.25

This relies on us being able to calculate the task load and task capacity:

**Task load** = number of scheduled tasks x average task interval
**Task capacity** = number of Kibana instances x task concurrency x ( 3,600,000 / poll interval)

The [Task Manager health API](https://www.elastic.co/guide/en/kibana/master/task-manager-health-monitoring.html) coupled with the information from Cloud provides us with enough information to perform these calculations, except for the ability to determine the average task interval.

The Task Manager health API response does include the workload schedule; however, behind the scenes, it's using an Elasticsearch terms aggregation without a specified size, so we only get 10 buckets.

## Feature request

Ideally, the Task Manager health API would return an additional field that specifies the average interval for all recurring tasks in milliseconds. This would allow the auto-scaling logic to use this value in its calculations.

There are likely some complications here because the `schedule.interval` is currently using Elasticsearch's [date math notation](https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/date-math-expressions.html), for example "1m" represents one minute, and the [avg aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html) doesn't work natively with this field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Manager health - average interval #94937

kobelb
openedon Mar 18, 2021

Context

Feature request

Assignees

Labels

Type

Projects

Milestone

Relationships

Development