Description
openedon Apr 12, 2021
Problem
Currently, the Task Manager health API returns statistics about Task Manager's configuration, workload, and runtime performance. The workload.value.schedule
currently returns the 10 most frequent intervals for the scheduled tasks, but it does not return the intervals for all scheduled tasks, as this would be infeasible to return a "bucket" for every single interval:
As part of the autoscaling Kibana project, we would like to scale Kibana based on the task-capacity vs the scheduled task-load. One of the missing data-points for performing this calculation is the average interval for all scheduled tasks and this can't be inferred from the workload.value.schedule
field.
Solution
The task-manager health API should be updated to return the workload.value.average_interval_ms
to support this autoscaling calculation.
Currently, each task document has a task.schedule.interval
field; however, this is a keyword
field and stores the intervals using Elasticsearch's date interval syntax: 10m
for 10 minutes, 100ms for 10 milliseconds. As a result, it's not possible to use the Elasticsearch avg aggregation on the task.schedule.interval
field. Instead, a task.schedule.interval_ms
field should be added so that the Elasticsearch avg aggregation can efficiently run.