Skip to content

[Alerting] Task Manager doesn't automatically recover if polling fails #74785

@gmmorris

Description

@gmmorris

Task Manager doesn't have any built in ability to recover if the polling cycle fails.
We have identified in the past failure cases where the polling cycle broke and addressed those cases, but ideally TM would recover independently when such a case happens by restarting a broken poller.

In order for us to gain full confidence in mission critical usage of alerting, a Nodemon like ability to restart the internal poller seems paramount.
Along side this change, we should expose metrics that can be collected on demand to aid in SDH support once we go GA.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions