Skip to content

Improve node health checks to detect slow disk IO #59824

@Bukhtawar

Description

@Bukhtawar

This is a follow-up from #52680 (comment) where we'd want to detect slow disk IO, mark the node UNHEALTHY and eventually remove such node from the cluster as a single slow disk can cause operations like indexing to slow down the entire _bulk.

We need to finalise reasonable thresholds for marking the node UNHEALTHY.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions