Skip to content

[Stack Monitoring] gracefully handle faulty alert state #178845

Open

Description

Summary

Stack monitoring rules in a faulty state (ie missing corresponding saved object) will brick access to the Stack monitoring application triggering an infinite retry loop to get cluster data. The root cause of faulty alert state is outside of stack monitoring scope but we should handle this specific broken case gracefully to still allow access to the application.

There are two paths that need improved error handling:

  • the landing route for stack monitoring will try to get status for the monitored clusters' rules. getClustersFromRequest should be updated to gracefully handle SavedObjectsClient/notFound errors
  • when viewing a cluster overview page, /alert/{clusterUuid}/status route will be called and similarly SavedObjectsClient/notFound error should be gracefully handled there

Steps to reproduce

  • ingest stack monitoring data and create default rules
  • delete at least one stack monitoring rule document from .kibana_task_manager*
  • stack monitoring cannot load anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions