Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide useful alert rule samples #1397

Open
rigtorp opened this issue Jun 25, 2019 · 11 comments
Open

Provide useful alert rule samples #1397

rigtorp opened this issue Jun 25, 2019 · 11 comments
Labels
help wanted platform/Linux Linux specific issue

Comments

@rigtorp
Copy link

rigtorp commented Jun 25, 2019

In order to provide a better out of the box experience node_exporter should come with a recommended set of alert rules that provides useful alerts for common Linux system issues.

Currently when deploying Prometheus and node_exporter the user needs to build up his alerts from scratch. This can be challenging when someone is new to Prometheus and not yet familiar with all the capabilities of PromQL.

@SuperQ SuperQ added help wanted platform/Linux Linux specific issue labels Jun 26, 2019
@aditya-konarde
Copy link

I can take this one and also #1398 so that the alerts and dashboards are consistent.

@SuperQ do we have any starting points here or I should go ahead and send a PR with the alerts that I currently have for Prod?

@SuperQ
Copy link
Member

SuperQ commented Jul 1, 2019

Let's start out by thinking about what kind of alerts we should provide as examples. There are quite a lot of bad node alerts out there. I'd like to avoid recommending the typical "my disk is X% full" kind of alerts that are noisy or non-actionable.

Alerts that we do include should follow best practices laid out by the SRE Book, RED/USE methods, etc.

@aditya-konarde
Copy link

I'll do a first draft tomorrow and we can then iterate on it.

I looked at the current set of alerts I have, and yours is a good point, we have some 'disk space <10%' kind of alerts too. (they shall be removed) ;)

@aditya-konarde
Copy link

PS: After discussing this with @brancz and a quick search later, I found out that @tomwilkie already has a PR here: #941

I'm okay to let Tom finish the PR, and even happy to pick up the PR if he's busy with something else.

@SuperQ
Copy link
Member

SuperQ commented Jul 1, 2019

You're welcome to take a look at the rules we use. The memory pressure one is pretty useful.

@detailyang
Copy link
Contributor

+1 builtin grafana dashboard

@brancz
Copy link
Member

brancz commented Jul 2, 2019

There is already work for a monitoring mixin for node exporter: #941

@beorn7 recently finished the work for the Prometheus monitoring mixin and it looks like he’s picking up the above PR judging by the last few comments.

@beorn7
Copy link
Member

beorn7 commented Jul 2, 2019

If everybody is fine with providing the examples in the form of jsonnet mixins, we should merge all our wisdom in #941. Note, however, that this is different from providing a plain example alert rule file (as you have to install some parts and run jsonnet to create those from the mixins). Mixins are the more flexible and powerful solution, though. (Power users can do a lot with them. Naive users just run make to get the plain YAML file and take it from there.)

@aditya-konarde
Copy link

@beorn7 +1 to merging efforts towards #941 , does it also make sense generate the files in an 'examples' directory for naive users to consume, or shall we leave it upto the users to clone locally and run make?

As a super naive user just getting started, it would be perhaps easier to just get some sample alerts from the 'documentation'.

@beorn7
Copy link
Member

beorn7 commented Jul 2, 2019

We could of course checkin the result of the make run, too. Let's keep that in mind to decide once the mixin PR is in a workable shape.

@aditya-konarde
Copy link

Ack, thanks a lot @beorn7 ; Do let me know if there's anything I can help with for the PR, but I'll leave it to you so that we don't duplicate efforts :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted platform/Linux Linux specific issue
Projects
None yet
Development

No branches or pull requests

6 participants