[Stack Monitoring] gracefully handle faulty alert state

### Summary
Stack monitoring rules in a faulty state (ie missing corresponding saved object) will brick access to the Stack monitoring application triggering an infinite retry loop to get cluster data. The root cause of faulty alert state is outside of stack monitoring scope but we should handle this specific broken case gracefully to still allow access to the application.

There are two paths that need improved error handling:
- the landing route for stack monitoring will try to get status for the monitored clusters' rules. [getClustersFromRequest](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/lib/cluster/get_clusters_from_request.ts#L130-L134) should be updated to gracefully handle `SavedObjectsClient/notFound` errors
- when viewing a cluster overview page, [/alert/{clusterUuid}/status](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/routes/api/v1/alerts/status.ts#L41-L46) route will be called and similarly `SavedObjectsClient/notFound` error should be gracefully handled there

**Steps to reproduce**
- ingest stack monitoring data and create default rules
- delete at least one stack monitoring rule document from `.kibana_task_manager*`
- stack monitoring cannot load anymore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] gracefully handle faulty alert state #178845

klacabane
openedon Mar 18, 2024

Summary

Assignees

Labels

Type

Projects

Milestone

Relationships

Development