Skip to content

Improve Monitoring/Alerting/Metrics #211

Open

Description

Story

As a provider I want timely alerts raised based on the metrics to take informed decisions

Motivation

Acceptance Criteria

  • Define alerts for the above situations to take required action

Definition of Done

  • Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
  • Unit tests are provided: Have you written automated unit tests?
  • Integration tests are provided: Have you written automated integration tests?
  • Minimum API exposure: If you have added/changed public API, was it really necessary/is it minimal?
  • Operations guide: Have you updated the operations guide about ops-relevant changes?
  • User documentation: Have you updated the READMEs/docs/how-tos about user-relevant changes?

Possible metrices to add (Rough work)

  • we could provide metrices on number of machines with different statuses , so filtering on that can be done (if already not exposed)
  • metrics about time taken for machine to join can be added, this will help to know overall average joining time on any provider
  • when MCM did scale-up , scale-down and when CA did.
  • metices that could solve typical DoD issues, like node not joining.
  • how much each resource took to get created like VM, disk especially in Azure.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    area/monitoringMonitoring (including availability monitoring and alerting) relatedeffort/1mEffort for issue is around 1 monthkind/enhancementEnhancement, improvement, extensionlifecycle/staleNobody worked on this for 6 months (will further age)needs/planningNeeds (more) planning with other MCM maintainersplatform/allpriority/2Priority (lower number equals higher priority)topology/seedAffects Seed clusters

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions