[Stack Monitoring] /api/v1/monitoring/clusters performance issues

I'm looking at the overview cluster and /api/v1/monitoring/clusters is a huge bottleneck. It times out even on shorter time ranges. We're doing a ton of requests to ES but the bottleneck is mostly CPU in Kibana. The response size is ~30mb. Some issues I see:

- for the request to get clusters, we use `filter_path` on source fields instead of `_source`. This helps with the response size but ES still has to get the source fields. We should just use `_source`.
- for every cluster, we end up doing three requests. for the overview cluster this means that we are doing 206 * 3 = over 600 search requests _in parallel_. There is no throttling.
- we don't use `filter_path` for any of the other requests. because `_clusters` is pretty big both ES and Kibana end up spending a lot of CPU cycles on JSON serialization and deserialization.

There's presumably a ton of other improvements we can do. The biggest one I suspect is that we don't want to load all the data up front. E.g. if we just list the clusters and some basic metrics that should be pretty fast. We also do three requests to this endpoint on page load (one for `elasticsearch`, two for `all`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] /api/v1/monitoring/clusters performance issues #191886

dgieselaar
openedon Aug 31, 2024

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Stack Monitoring] /api/v1/monitoring/clusters performance issues #191886

Description

dgieselaaropenedon Aug 31, 2024

Metadata