Skip to content

[APM] Alerts for throughput and failure rate anomalies #159288

Open

Description

Today we allow users to create anomaly detection jobs (ML Jobs) which will produce anomaly results for latency, throughput and failure rates.
Users can create rules and be alerted when there are anomalies for latency but they have no way of doing the same for throughput and failure rate anomalies.

There is a ruled called ApmRuleType.Anomaly and the user facing description for this rule is:

Alert when either the latency, throughput, or failed transaction rate of a service is anomalous.

This is quite misleading because it does in fact not produce alerts for throughput or failed transaction rate. Only latency as can be seen in the terms filter below:

...termQuery(
'detector_index',
getApmMlDetectorIndex(ApmMlDetectorType.txLatency)
),

Solution

It should be possible to receive alerts for throughput and failure rate anomalies. Instead of creating new rules the existing ApmRuleType.Anomaly rule should be updated to also produce alerts for other types of anomalies than latency.

Related enhancement request: https://github.com/elastic/enhancements/issues/12409 (internal)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions