-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide useful alert rule samples #1397
Comments
Let's start out by thinking about what kind of alerts we should provide as examples. There are quite a lot of bad node alerts out there. I'd like to avoid recommending the typical "my disk is X% full" kind of alerts that are noisy or non-actionable. Alerts that we do include should follow best practices laid out by the SRE Book, RED/USE methods, etc. |
I'll do a first draft tomorrow and we can then iterate on it. I looked at the current set of alerts I have, and yours is a good point, we have some 'disk space <10%' kind of alerts too. (they shall be removed) ;) |
PS: After discussing this with @brancz and a quick search later, I found out that @tomwilkie already has a PR here: #941 I'm okay to let Tom finish the PR, and even happy to pick up the PR if he's busy with something else. |
You're welcome to take a look at the rules we use. The memory pressure one is pretty useful. |
+1 builtin grafana dashboard |
If everybody is fine with providing the examples in the form of jsonnet mixins, we should merge all our wisdom in #941. Note, however, that this is different from providing a plain example alert rule file (as you have to install some parts and run jsonnet to create those from the mixins). Mixins are the more flexible and powerful solution, though. (Power users can do a lot with them. Naive users just run |
@beorn7 +1 to merging efforts towards #941 , does it also make sense generate the files in an 'examples' directory for naive users to consume, or shall we leave it upto the users to clone locally and run As a super naive user just getting started, it would be perhaps easier to just get some sample alerts from the 'documentation'. |
We could of course checkin the result of the |
Ack, thanks a lot @beorn7 ; Do let me know if there's anything I can help with for the PR, but I'll leave it to you so that we don't duplicate efforts :) |
In order to provide a better out of the box experience node_exporter should come with a recommended set of alert rules that provides useful alerts for common Linux system issues.
Currently when deploying Prometheus and node_exporter the user needs to build up his alerts from scratch. This can be challenging when someone is new to Prometheus and not yet familiar with all the capabilities of PromQL.
The text was updated successfully, but these errors were encountered: