mixin: Fix alert about unhealthy sidecar #2929

hwoarang · 2020-07-23T13:45:04Z

The alert was giving the wrong information as the $value contained
the number of pods that failing to send heartbeat instead of the actual
number of seconds that each sidecar was being unhealthy.

Also the 5 minute interval is probably too low as on large deployments
prometheus could take much longer to come up online and for sidecar to
become actually useful.

As such, we can simply subtract the timestamp of the last heartbeat from
the current time and fire if we are lagging for more than 10 minutes.

This is also consistent with the current unit tests.

I added CHANGELOG entry for this change.

Changes

Improve alert for sidecar to provide correct information and not fire too soon

Verification

Deployed locally and did a prometheus update and it did not fire anymore

kakkoyun

lgtm 🥇 It needs rebasing otherwise we can merge

hwoarang · 2020-08-04T17:49:35Z

lgtm 1st_place_medal It needs rebasing otherwise we can merge

@kakkoyun thank you for the approval. I have just rebased it

kakkoyun

I just realized a minor thing, could have another look at it?

examples/alerts/alerts.md

kakkoyun · 2020-08-12T07:07:23Z

@hwoarang Friendly ping.

hwoarang · 2020-08-12T09:06:38Z

@kakkoyun apologies for the delay but due to holidays, connectivity and time are at a premium :) nevertheless I will try to get to it as soon as possible.

hwoarang · 2020-08-12T13:57:28Z

@kakkoyun I believe I have addressed all your concerns now

mixin/alerts/sidecar.libsonnet

The alert was giving the wrong information as the $value contained the number of pods that failing to send heartbeat instead of the actual number of seconds that each sidecar was being unhealthy. Also the 5 minute interval is probably too low as on large deployments prometheus could take much longer to come up online and for sidecar to become actually useful. As such, we can simply subtract the timestamp of the last heartbeat from the current time and fire if we are lagging for more than 10 minutes. Signed-off-by: Markos Chandras <markos@chandras.me>

kakkoyun

lgtm

kakkoyun · 2020-08-12T15:20:15Z

@hwoarang Thanks a lot 🙏

hwoarang force-pushed the fix-alert-for-sidecar branch 7 times, most recently from 6c4a018 to 6678f08 Compare July 28, 2020 14:10

kakkoyun approved these changes Aug 4, 2020

View reviewed changes

hwoarang force-pushed the fix-alert-for-sidecar branch from 6678f08 to 8960536 Compare August 4, 2020 17:48

kakkoyun requested changes Aug 5, 2020

View reviewed changes

examples/alerts/alerts.md Show resolved Hide resolved

examples/alerts/alerts.md Outdated Show resolved Hide resolved

hwoarang force-pushed the fix-alert-for-sidecar branch 2 times, most recently from c82af6d to bb529df Compare August 12, 2020 13:56

hwoarang requested a review from kakkoyun August 12, 2020 13:57

kakkoyun reviewed Aug 12, 2020

View reviewed changes

mixin/alerts/sidecar.libsonnet Outdated Show resolved Hide resolved

hwoarang force-pushed the fix-alert-for-sidecar branch from bb529df to c3bdfbf Compare August 12, 2020 14:10

hwoarang requested a review from kakkoyun August 12, 2020 15:00

kakkoyun approved these changes Aug 12, 2020

View reviewed changes

kakkoyun merged commit d6305f5 into thanos-io:master Aug 12, 2020

hwoarang deleted the fix-alert-for-sidecar branch August 12, 2020 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixin: Fix alert about unhealthy sidecar #2929

mixin: Fix alert about unhealthy sidecar #2929

hwoarang commented Jul 23, 2020 •

edited

Loading

kakkoyun left a comment

hwoarang commented Aug 4, 2020

kakkoyun left a comment

kakkoyun commented Aug 12, 2020

hwoarang commented Aug 12, 2020

hwoarang commented Aug 12, 2020

kakkoyun left a comment

kakkoyun commented Aug 12, 2020

mixin: Fix alert about unhealthy sidecar #2929

mixin: Fix alert about unhealthy sidecar #2929

Conversation

hwoarang commented Jul 23, 2020 • edited Loading

Changes

Verification

kakkoyun left a comment

Choose a reason for hiding this comment

hwoarang commented Aug 4, 2020

kakkoyun left a comment

Choose a reason for hiding this comment

kakkoyun commented Aug 12, 2020

hwoarang commented Aug 12, 2020

hwoarang commented Aug 12, 2020

kakkoyun left a comment

Choose a reason for hiding this comment

kakkoyun commented Aug 12, 2020

hwoarang commented Jul 23, 2020 •

edited

Loading