[Stack Monitoring] Change SM rule types to generate discrete alert instances per node

The current `BaseAlert` used to define all of the Stack Monitoring rule types has de-duping logic to avoid alert noise. If a cluster has 20 nodes that suddenly all move into an alertable state for a particular rule, the SM rules will _not_ create 20 alerts, instead, the rule will create 1 single alert for the cluster, listing all of the alerting nodes in the message.

The way we do this is to use a custom alert instance ID: https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/server/alerts/base_alert.ts#L286

```
${this.alertOptions.id}:${cluster.clusterUuid}:${firingNodeUuids}
```

where `alertOptions.id` is something like `monitoring_alert_cpu_usage` for the CPU rule type, `clusterUuid` is the id for that ES cluster, and `firingNodeUuids` is an array of currently firing node IDs joined by a `,`. This means that if the list of firing IDs stays constant, this will continue to be one single alert, and actions will be throttled accordingly. However, if that list of firing IDs changes (node(s) stop firing, new node(s) begin firing, etc.), then a new alert instance will be created and new actions will be triggered according to a new throttle schedule. 

There are a few problems with this approach, based on [the Alerting docs for the `services.alertInstanceFactory` method](https://github.com/elastic/kibana/tree/master/x-pack/plugins/alerting#alert-instance-factory).

* The docs say, "Note that the id only needs to be unique within the scope of a specific alert, not unique across all alerts or alert types", so we don't need to prefix these instance IDs with the `alertOptions.id`. 
* By implementing our own custom grouping with these alerts, we may accidentally block our ability to incorporate new features that the alerting framework gives us
  - Resolve action groups are currently tricky if not impossible for us to implement because or alerts don't resolve until all nodes on the cluster resolve, although technically each alert instance resolves once the list of firing IDs changes and a new instance is created. 
* This ID generation appears to cause problems if a user[ happens to create two or more of the same kind of rule](https://github.com/elastic/kibana/issues/91145) from a given rule type (TBD on what exact issues, but [this PR was reverted](https://github.com/elastic/kibana/pull/94167) because of issues that @igoristic reported).

AC:
* Each firing node should generate its own alert instance (its ID can just be its node ID) which will then have the user's throttling rules applied to it individually. 
* Context variables for all SM alert types, along with any default messages, are updated to assume they are per node where applicable
* UI that displays this alert must be able to handle a per node alert (but also handle the old style for backwards-compatibility)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Stack Monitoring] Change SM rule types to generate discrete alert instances per node #100136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Stack Monitoring] Change SM rule types to generate discrete alert instances per node #100136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions