Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the jaeger-mixin for monitoring #1668

Merged
merged 2 commits into from
Aug 7, 2019

Conversation

gouthamve
Copy link
Contributor

I'm submitting this after discussing with @jpkrohling. See other mixin usages here: https://github.com/etcd-io/etcd/tree/master/Documentation/etcd-mixin

This provides a customisable set of alerts and dashboards that can be used to monitor jaeger, in the mixin format that is popularized by the Prometheus community.

@codecov
Copy link

codecov bot commented Jul 18, 2019

Codecov Report

Merging #1668 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1668   +/-   ##
=======================================
  Coverage   98.51%   98.51%           
=======================================
  Files         193      193           
  Lines        9314     9314           
=======================================
  Hits         9176     9176           
  Misses        110      110           
  Partials       28       28

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcaea9d...a30b0a0. Read the comment docs.

examples/jaeger-mixin/alerts.libsonnet Outdated Show resolved Hide resolved
examples/jaeger-mixin/dashboards.libsonnet Outdated Show resolved Hide resolved
examples/jaeger-mixin/dashboards.libsonnet Outdated Show resolved Hide resolved
examples/jaeger-mixin/dashboards.libsonnet Outdated Show resolved Hide resolved
examples/jaeger-mixin/README.md Outdated Show resolved Hide resolved
examples/jaeger-mixin/README.md Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Contributor

Another question: should we provide a pre-built version of the alerts and dashboards, for users who don't want/can't install jsonnet and/or jb?

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all to diminish the value of this PR, but Is there a UI that can edit the dashboard? I can tolerate editing alert definitions manually (even then - bleh), but editing dashboard files is beyond poor user experience. At Uber we take the approach that you have a UI for editing these complex configs, but the UI generates a code diff in the end, so that we can treat configuration as code, go through code review, etc.

About the alert file, does the format allow including some descriptions or links to run books? I find it very bad practice to have alerts that only include some cryptic “name”, esp. if someone gets woken up by them in the middle of the night. If we know enough about a condition to decide that it warrants an alert, then alert should include that explanation, and, preferably, some pointers to remediation steps.

@jpkrohling
Copy link
Contributor

Is there a UI that can edit the dashboard?

The dashboards can indeed be changed directly in Grafana, once they are loaded there. Note however that we'd either need to provide the dashboard JSON files generated from the mixins, or have users generate the JSON by themselves.

About the alert file, does the format allow including some descriptions or links to run books?

Good point -- I don't know the answer myself, but I think we could extend our documentation in the future to include a description about those alerts.

@gouthamve
Copy link
Contributor Author

gouthamve commented Jul 18, 2019 via email

@yurishkuro
Copy link
Member

How to mix runbooks for mixins is an ongoing discussion in the mixin community and there will soon be a way to distribute the runbooks too

I was thinking of something simpler, a description field that can contain the explanation of the meaning of the alert and suggestions for remediation, or just a link to a doc (the same explanations would make sense to exist in the documentation).

@jpkrohling
Copy link
Contributor

@gouthamve do you have any news on this one?

@jpkrohling
Copy link
Contributor

@gouthamve I just opened #1700 with the changes discussed here. If you had them implemented already, let me know and I'll close my PR in favor of yours. Otherwise, feel free to close this one and comment on that one. I tried to keep you as the author of the original commit.

@jpkrohling
Copy link
Contributor

@objectiser, @yurishkuro, could you please review again? Once this is merged, I'll try to get feedback from people already using Prometheus to monitor Jaeger to see if they can help us improve the mixins.

monitoring/jaeger-mixin/README.md Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Outdated Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Outdated Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Show resolved Hide resolved
@jpkrohling
Copy link
Contributor

@yurishkuro, @objectiser is there anything else pending, or can this one be merged?

Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to get an answer about the annotations - but that isn't necessarily a blocker. We could evolve this area with feedback from the community.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have objections to merging. This seems highly experimental, so best to get feedback. One general comment is I think the files could use some comments, e.g. when I think back to sample config files for Cassandra or httpd, every property there is documented.

monitoring/jaeger-mixin/README.md Outdated Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Show resolved Hide resolved
monitoring/jaeger-mixin/README.md Show resolved Hide resolved
@jpkrohling jpkrohling mentioned this pull request Aug 6, 2019
3 tasks
gouthamve and others added 2 commits August 7, 2019 10:12
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling
Copy link
Contributor

Rebased, to see if we can get past of the build failures (master now seems to have travis_retry in key steps)

@jpkrohling jpkrohling merged commit aa76cf0 into jaegertracing:master Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants