Open
Description
openedon Mar 18, 2024
Summary
Stack monitoring rules in a faulty state (ie missing corresponding saved object) will brick access to the Stack monitoring application triggering an infinite retry loop to get cluster data. The root cause of faulty alert state is outside of stack monitoring scope but we should handle this specific broken case gracefully to still allow access to the application.
There are two paths that need improved error handling:
- the landing route for stack monitoring will try to get status for the monitored clusters' rules. getClustersFromRequest should be updated to gracefully handle
SavedObjectsClient/notFound
errors - when viewing a cluster overview page, /alert/{clusterUuid}/status route will be called and similarly
SavedObjectsClient/notFound
error should be gracefully handled there
Steps to reproduce
- ingest stack monitoring data and create default rules
- delete at least one stack monitoring rule document from
.kibana_task_manager*
- stack monitoring cannot load anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment