Auto-silent on insignificant alarms #346

stkonst · 2020-03-30T06:35:52Z

Hi guys,

We have quite some alarms on artermis appearing where peers_seen =< 2 and AS_affected <= 3

Those are very insignificant alarms and would never bother ourselves to chase them. Thus, is it possible to have a feature on Artemis where those alarms could be skipped/auto-ignored or disappear automatically? I am looking forward for your feedback and I can provide examples if needed.

Thank 's

Stavros
AMS-IX NOC

vkotronis · 2020-04-07T17:39:54Z

@stkonst This is a good idea!

However, let's discuss a bit the logic so that we can integrate this into the tool properly. According to the hijack states wiki page we have some defined states that a hijack alert goes through (i.e., a "life-cycle"). Do you mean that if the seen peers and/or the infected ASes stay below user-defined thresholds (e.g., 2 and 3 respectively) for a user-defined interval (e.g., 1 hour), they should enter a non-active state, e.g., dormant or insignificant? The user-provided parameters could be provided via the .env file (as the rest of them), therefore to implement this we would need some kind of cleaner robot program (we have done sth similar for the deletion of old BGP updates and making hijacks dormant) that goes through the DB-stored alerts and re-characterizing them. Maybe we could use the "ignored" tag.

My problem with this is that if another BGP update related to the ignored (or auto-silenced) alert comes in we would have to generate a new alert and repeat the process.

Could you maybe provide some examples here for the 3 parameters (seen peers, infected ASes and no-change-interval) but without sharing any private information if possible? It would be interesting to see for how much time after detection you actually get BGP updates, even though they point to the same (or similar) small number of infected ASes and/or seen peers.

We could also continue this discussion on bgpartemis.slack.com for more details.
Thanks for reporting this; I think we could implement sth like this but it would be best to coordinate on potential test cases (and see a few practical examples) before attempting to alter the life-cycle of new alerts. We should also keep in mind that alerts follow a "ramp up - peak - ramp down" phase; we should not auto-silence alerts that start slow but become critical afterwards (or generate more than 1 alerts in that case).

vkotronis · 2020-05-04T15:12:12Z

most viable solution I think with the current requirements:

new .env variables:
AUTO_IGNORE_NUM_ASES_INFECTED (threshold for number of infected ASes, default=0)
AUTO_IGNORE_NUM_PEERS_SEEN(threshold for number of seen peers, default=0)
AUTO_IGNORE_INTERVAL (when the thresholds will be verified and auto-ignore will happen)
Implement sth similar with here , however, set the alerts to ignored and also clean up redis (see ignore workflow at the DB module) If the redis cleanup cannot be implemented at the postgres entrypoint, we can use the clock/scheduler microservice and do this periodically (e.g., every minute).

Workflow: set as ignored all alerts for which it holds that either their infected ASes or their seen peers are below the respective thresholds for more than the auto-ignore interval (clean up redis and update DB). Will form a draft on this after I discuss with Stavros.

So changes need to take place at the following places:

scheduler (send clock signal)
database (receive clock signal, clean up redis, update DB)
env (plus k8s)
wiki (to explain the new env vars and their utility)

stkonst changed the title ~~Auto-silent low criticality of alarms~~ Auto-silent on insignificant alarms Mar 30, 2020

vkotronis self-assigned this Apr 5, 2020

vkotronis added automation detection docs frontend help wanted Extra attention is needed logging p/medium Medium priority labels Apr 5, 2020

vkotronis added this to the release-1.4.1 milestone Apr 5, 2020

vkotronis added database and removed detection labels May 4, 2020

vkotronis mentioned this issue May 4, 2020

Autoignore mechanism for hijacks of limited impact/visibility #373

Merged

20 tasks

vkotronis modified the milestones: release-1.5.0, release-.1.6.0 May 24, 2020

slowr added the enhancement New feature or request label Jul 21, 2020

vkotronis closed this as completed in #373 Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-silent on insignificant alarms #346

Auto-silent on insignificant alarms #346

stkonst commented Mar 30, 2020

vkotronis commented Apr 7, 2020

vkotronis commented May 4, 2020 •

edited

Loading

Auto-silent on insignificant alarms #346

Auto-silent on insignificant alarms #346

Comments

stkonst commented Mar 30, 2020

vkotronis commented Apr 7, 2020

vkotronis commented May 4, 2020 • edited Loading

vkotronis commented May 4, 2020 •

edited

Loading