[Feature Request] Counter Metrics to detect leader and follower check failures. #12711
Open
Description
Is your feature request related to a problem? Please describe
Given the introduction of Request Tracing Framework (RTF) using OpenTelemetry (OTel), metrics (histogram/counter) can now be published and used to track failures.
This issue tracks the instrumentation for introducing following 2 counter metrics to identify node drops/health check failures for both the leader and follower nodes:
- Leader Check Failures-> Health check failure for ClusterManager Node (leader) performed by follower nodes.
- Follower Check Failures -> Health check failures for follower nodes performed by ClusterManager Node (leader).
Describe the solution you'd like
OTel Counter Metrics: Support for Counter type metrics, which was added as part of #10241, can be utilised to publish the metrics.
Related component
Cluster Manager
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Assignees
Type
Projects
Status
Now(This Quarter)