Skip to content

Commit 3887055

Browse files
authored
[Alerting]: get type-checking, tests, and ui working for index threshold (#59064)
This is a follow-on to #57030 , "[alerting] initial index threshold alertType and supporting APIs", to get it working with the existing alerting UI. The parameter shape was different between the two, so the alertType was changed to fix the existing UI shapes expected.
1 parent bfca202 commit 3887055

File tree

18 files changed

+342
-236
lines changed

18 files changed

+342
-236
lines changed

x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/README.md

Lines changed: 47 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -58,43 +58,47 @@ Finally, create the alert:
5858
```
5959
kbn-alert create .index-threshold 'es-hb-sim threshold' 1s \
6060
'{
61-
index: es-hb-sim
62-
timeField: @timestamp
63-
aggType: average
64-
aggField: summary.up
65-
groupField: monitor.name.keyword
66-
window: 5s
67-
comparator: lessThan
68-
threshold: [ 0.6 ]
61+
index: es-hb-sim
62+
timeField: @timestamp
63+
aggType: avg
64+
aggField: summary.up
65+
groupBy: top
66+
termSize: 100
67+
termField: monitor.name.keyword
68+
timeWindowSize: 5
69+
timeWindowUnit: s
70+
thresholdComparator: <
71+
threshold: [ 0.6 ]
6972
}' \
7073
"[
7174
{
72-
group: threshold met
73-
id: '$ACTION_ID'
75+
group: threshold met
76+
id: '$ACTION_ID'
7477
params: {
75-
level: warn
76-
message: '{{context.message}}'
78+
level: warn
79+
message: '{{{context.message}}}'
7780
}
7881
}
7982
]"
8083
```
8184

8285
This alert will run a query over the `es-hb-sim` index, using the `@timestamp`
83-
field as the date field, using an `average` aggregation over the `summary.up`
84-
field. The results are then aggregated by `monitor.name.keyword`. If we ran
86+
field as the date field, aggregating over groups of the field value
87+
`monitor.name.keyword` (the top 100 groups), then aggregating those values
88+
using an `average` aggregation over the `summary.up` field. If we ran
8589
another instance of `es-hb-sim`, using `host-B` instead of `host-A`, then the
8690
alert will end up potentially scheduling actions for both, independently.
8791
Within the alerting plugin, this grouping is also referred to as "instanceIds"
8892
(`host-A` and `host-B` being distinct instanceIds, which can have actions
8993
scheduled against them independently).
9094

91-
The `window` is set to `5s` which is 5 seconds. That means, every time the
95+
The time window is set to 5 seconds. That means, every time the
9296
alert runs it's queries (every second, in the example above), it will run it's
9397
ES query over the last 5 seconds. Thus, the queries, over time, will overlap.
9498
Sometimes that's what you want. Other times, maybe you just want to do
9599
sampling, running an alert every hour, with a 5 minute window. Up to the you!
96100

97-
Using the `comparator` `lessThan` and `threshold` `[0.6]`, the alert will
101+
Using the `thresholdComparator` `<` and `threshold` `[0.6]`, the alert will
98102
calculate the average of all the `summary.up` fields for each unique
99103
`monitor.name.keyword`, and then if the value is less than 0.6, it will
100104
schedule the specified action (server log) to run. The `message` param
@@ -110,11 +114,10 @@ working:
110114

111115
```
112116
server log [17:32:10.060] [warning][actions][actions][plugins] \
113-
Server log: alert es-hb-sim threshold instance host-A value 0 \
114-
exceeded threshold average(summary.up) lessThan 0.6 over 5s \
117+
Server log: alert es-hb-sim threshold group host-A value 0 \
118+
exceeded threshold avg(summary.up) < 0.6 over 5s \
115119
on 2020-02-20T22:32:07.000Z
116120
```
117-
118121
[kbn-action]: https://github.com/pmuellr/kbn-action
119122
[es-hb-sim]: https://github.com/pmuellr/es-hb-sim
120123
[now-iso]: https://github.com/pmuellr/now-iso
@@ -144,15 +147,18 @@ This example uses [now-iso][] to generate iso date strings.
144147
```console
145148
curl -k "https://elastic:changeme@localhost:5601/api/alerting_builtins/index_threshold/_time_series_query" \
146149
-H "kbn-xsrf: foo" -H "content-type: application/json" -d "{
147-
\"index\": \"es-hb-sim\",
148-
\"timeField\": \"@timestamp\",
149-
\"aggType\": \"average\",
150-
\"aggField\": \"summary.up\",
151-
\"groupField\": \"monitor.name.keyword\",
152-
\"interval\": \"1s\",
153-
\"dateStart\": \"`now-iso -10s`\",
154-
\"dateEnd\": \"`now-iso`\",
155-
\"window\": \"5s\"
150+
\"index\": \"es-hb-sim\",
151+
\"timeField\": \"@timestamp\",
152+
\"aggType\": \"avg\",
153+
\"aggField\": \"summary.up\",
154+
\"groupBy\": \"top\",
155+
\"termSize\": 100,
156+
\"termField\": \"monitor.name.keyword\",
157+
\"interval\": \"1s\",
158+
\"dateStart\": \"`now-iso -10s`\",
159+
\"dateEnd\": \"`now-iso`\",
160+
\"timeWindowSize\": 5,
161+
\"timeWindowUnit\": \"s\"
156162
}"
157163
```
158164

@@ -184,13 +190,16 @@ To get the current value of the calculated metric, you can leave off the date:
184190
```
185191
curl -k "https://elastic:changeme@localhost:5601/api/alerting_builtins/index_threshold/_time_series_query" \
186192
-H "kbn-xsrf: foo" -H "content-type: application/json" -d '{
187-
"index": "es-hb-sim",
188-
"timeField": "@timestamp",
189-
"aggType": "average",
190-
"aggField": "summary.up",
191-
"groupField": "monitor.name.keyword",
192-
"interval": "1s",
193-
"window": "5s"
193+
"index": "es-hb-sim",
194+
"timeField": "@timestamp",
195+
"aggType": "avg",
196+
"aggField": "summary.up",
197+
"groupBy": "top",
198+
"termField": "monitor.name.keyword",
199+
"termSize": 100,
200+
"interval": "1s",
201+
"timeWindowSize": 5,
202+
"timeWindowUnit": "s"
194203
}'
195204
```
196205

@@ -254,7 +263,7 @@ be ~24 time series points in the output.
254263

255264
For preview purposes:
256265

257-
- The `groupLimit` parameter should be used to help cut
266+
- The `termSize` parameter should be used to help cut
258267
down on the amount of work ES does, and keep the generated graphs a little
259268
simpler. Probably something like `10`.
260269

@@ -263,9 +272,9 @@ simpler. Probably something like `10`.
263272
could result in a lot of time-series points being generated, which is both
264273
costly in ES, and may result in noisy graphs.
265274

266-
- The `window` parameter should be the same as what the alert is using,
275+
- The `timeWindow*` parameters should be the same as what the alert is using,
267276
especially for the `count` and `sum` aggregation types. Those aggregations
268277
don't scale the same way the others do, when the window changes. Even for
269278
the other aggregations, changing the window could result in dramatically
270-
different values being generated - `averages` will be more "average-y", `min`
279+
different values being generated - `avg` will be more "average-y", `min`
271280
and `max` will be a little stickier.

x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/action_context.test.ts

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,20 @@ describe('ActionContext', () => {
2121
index: '[index]',
2222
timeField: '[timeField]',
2323
aggType: 'count',
24-
window: '5m',
25-
comparator: 'greaterThan',
24+
groupBy: 'top',
25+
termField: 'x',
26+
termSize: 100,
27+
timeWindowSize: 5,
28+
timeWindowUnit: 'm',
29+
thresholdComparator: '>',
2630
threshold: [4],
2731
});
2832
const context = addMessages(base, params);
2933
expect(context.subject).toMatchInlineSnapshot(
3034
`"alert [name] group [group] exceeded threshold"`
3135
);
3236
expect(context.message).toMatchInlineSnapshot(
33-
`"alert [name] group [group] value 42 exceeded threshold count greaterThan 4 over 5m on 2020-01-01T00:00:00.000Z"`
37+
`"alert [name] group [group] value 42 exceeded threshold count > 4 over 5m on 2020-01-01T00:00:00.000Z"`
3438
);
3539
});
3640

@@ -46,18 +50,22 @@ describe('ActionContext', () => {
4650
const params = ParamsSchema.validate({
4751
index: '[index]',
4852
timeField: '[timeField]',
49-
aggType: 'average',
53+
aggType: 'avg',
54+
groupBy: 'top',
55+
termField: 'x',
56+
termSize: 100,
5057
aggField: '[aggField]',
51-
window: '5m',
52-
comparator: 'greaterThan',
58+
timeWindowSize: 5,
59+
timeWindowUnit: 'm',
60+
thresholdComparator: '>',
5361
threshold: [4.2],
5462
});
5563
const context = addMessages(base, params);
5664
expect(context.subject).toMatchInlineSnapshot(
5765
`"alert [name] group [group] exceeded threshold"`
5866
);
5967
expect(context.message).toMatchInlineSnapshot(
60-
`"alert [name] group [group] value 42 exceeded threshold average([aggField]) greaterThan 4.2 over 5m on 2020-01-01T00:00:00.000Z"`
68+
`"alert [name] group [group] value 42 exceeded threshold avg([aggField]) > 4.2 over 5m on 2020-01-01T00:00:00.000Z"`
6169
);
6270
});
6371

@@ -74,8 +82,12 @@ describe('ActionContext', () => {
7482
index: '[index]',
7583
timeField: '[timeField]',
7684
aggType: 'count',
77-
window: '5m',
78-
comparator: 'between',
85+
groupBy: 'top',
86+
termField: 'x',
87+
termSize: 100,
88+
timeWindowSize: 5,
89+
timeWindowUnit: 'm',
90+
thresholdComparator: 'between',
7991
threshold: [4, 5],
8092
});
8193
const context = addMessages(base, params);

x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/action_context.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,9 @@ export function addMessages(c: BaseActionContext, p: Params): ActionContext {
4747
);
4848

4949
const agg = p.aggField ? `${p.aggType}(${p.aggField})` : `${p.aggType}`;
50-
const humanFn = `${agg} ${p.comparator} ${p.threshold.join(',')}`;
50+
const humanFn = `${agg} ${p.thresholdComparator} ${p.threshold.join(',')}`;
5151

52+
const window = `${p.timeWindowSize}${p.timeWindowUnit}`;
5253
const message = i18n.translate(
5354
'xpack.alertingBuiltins.indexThreshold.alertTypeContextMessageDescription',
5455
{
@@ -59,7 +60,7 @@ export function addMessages(c: BaseActionContext, p: Params): ActionContext {
5960
group: c.group,
6061
value: c.value,
6162
function: humanFn,
62-
window: p.window,
63+
window,
6364
date: c.date,
6465
},
6566
}

x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/alert_type.test.ts

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
import { loggingServiceMock } from '../../../../../../src/core/server/mocks';
88
import { getAlertType } from './alert_type';
9+
import { Params } from './alert_type_params';
910

1011
describe('alertType', () => {
1112
const service = {
@@ -24,12 +25,14 @@ describe('alertType', () => {
2425
});
2526

2627
it('validator succeeds with valid params', async () => {
27-
const params = {
28+
const params: Partial<Writable<Params>> = {
2829
index: 'index-name',
2930
timeField: 'time-field',
3031
aggType: 'count',
31-
window: '5m',
32-
comparator: 'greaterThan',
32+
groupBy: 'all',
33+
timeWindowSize: 5,
34+
timeWindowUnit: 'm',
35+
thresholdComparator: '<',
3336
threshold: [0],
3437
};
3538

@@ -40,12 +43,14 @@ describe('alertType', () => {
4043
const paramsSchema = alertType.validate?.params;
4144
if (!paramsSchema) throw new Error('params validator not set');
4245

43-
const params = {
46+
const params: Partial<Writable<Params>> = {
4447
index: 'index-name',
4548
timeField: 'time-field',
4649
aggType: 'foo',
47-
window: '5m',
48-
comparator: 'greaterThan',
50+
groupBy: 'all',
51+
timeWindowSize: 5,
52+
timeWindowUnit: 'm',
53+
thresholdComparator: '>',
4954
threshold: [0],
5055
};
5156

x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/alert_type.ts

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { i18n } from '@kbn/i18n';
88
import { AlertType, AlertExecutorOptions } from '../../types';
99
import { Params, ParamsSchema } from './alert_type_params';
1010
import { BaseActionContext, addMessages } from './action_context';
11+
import { TimeSeriesQuery } from './lib/time_series_query';
1112

1213
export const ID = '.index-threshold';
1314

@@ -46,24 +47,26 @@ export function getAlertType(service: Service): AlertType {
4647
const { alertId, name, services } = options;
4748
const params: Params = options.params as Params;
4849

49-
const compareFn = ComparatorFns.get(params.comparator);
50+
const compareFn = ComparatorFns.get(params.thresholdComparator);
5051
if (compareFn == null) {
51-
throw new Error(getInvalidComparatorMessage(params.comparator));
52+
throw new Error(getInvalidComparatorMessage(params.thresholdComparator));
5253
}
5354

5455
const callCluster = services.callCluster;
5556
const date = new Date().toISOString();
5657
// the undefined values below are for config-schema optional types
57-
const queryParams = {
58+
const queryParams: TimeSeriesQuery = {
5859
index: params.index,
5960
timeField: params.timeField,
6061
aggType: params.aggType,
6162
aggField: params.aggField,
62-
groupField: params.groupField,
63-
groupLimit: params.groupLimit,
63+
groupBy: params.groupBy,
64+
termField: params.termField,
65+
termSize: params.termSize,
6466
dateStart: date,
6567
dateEnd: date,
66-
window: params.window,
68+
timeWindowSize: params.timeWindowSize,
69+
timeWindowUnit: params.timeWindowUnit,
6770
interval: undefined,
6871
};
6972
const result = await service.indexThreshold.timeSeriesQuery({
@@ -100,7 +103,7 @@ export function getAlertType(service: Service): AlertType {
100103

101104
export function getInvalidComparatorMessage(comparator: string) {
102105
return i18n.translate('xpack.alertingBuiltins.indexThreshold.invalidComparatorErrorMessage', {
103-
defaultMessage: 'invalid comparator specified: {comparator}',
106+
defaultMessage: 'invalid thresholdComparator specified: {comparator}',
104107
values: {
105108
comparator,
106109
},
@@ -111,10 +114,10 @@ type ComparatorFn = (value: number, threshold: number[]) => boolean;
111114

112115
function getComparatorFns(): Map<string, ComparatorFn> {
113116
const fns: Record<string, ComparatorFn> = {
114-
lessThan: (value: number, threshold: number[]) => value < threshold[0],
115-
lessThanOrEqual: (value: number, threshold: number[]) => value <= threshold[0],
116-
greaterThanOrEqual: (value: number, threshold: number[]) => value >= threshold[0],
117-
greaterThan: (value: number, threshold: number[]) => value > threshold[0],
117+
'<': (value: number, threshold: number[]) => value < threshold[0],
118+
'<=': (value: number, threshold: number[]) => value <= threshold[0],
119+
'>=': (value: number, threshold: number[]) => value >= threshold[0],
120+
'>': (value: number, threshold: number[]) => value > threshold[0],
118121
between: (value: number, threshold: number[]) => value >= threshold[0] && value <= threshold[1],
119122
notBetween: (value: number, threshold: number[]) =>
120123
value < threshold[0] || value > threshold[1],

0 commit comments

Comments
 (0)