-
Notifications
You must be signed in to change notification settings - Fork 1.2k
✨ Adopt native histograms #3165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Adopt native histograms #3165
Conversation
Welcome @krisztianfekete! |
Hi @krisztianfekete. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -60,6 +62,9 @@ var ( | |||
Help: "Length of time per reconciliation per controller", | |||
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, | |||
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60}, | |||
NativeHistogramBucketFactor: 1.1, | |||
NativeHistogramMaxBucketNumber: 100, | |||
NativeHistogramMinResetDuration: 1 * time.Hour, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, can we please do it for all histograms and not just one? And what was the reasoning behind choosing that NativeHistogramBucketFactor
and NativeHistogramMaxBucketNumber
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me also add it for workqueue metrics too!
The values I've set are sensible defaults that most project starts with (us included).
Please see the following references, reasoning:
NativeHistogramBucketFactor
: https://github.com/prometheus/client_golang/blob/331dfab0cc853dca0242a0d96a80184087a80c1d/prometheus/histogram.go#L405-L429NativeHistogramMaxBucketNumber
: I've seen this being set to 100 usually, unless it is desired to mimic Exponential Histograms in OTel closer, in which case this can also be 160.NativeHistogramMinResetDuration
: "If at least NativeHistogramMinResetDuration has passed since the last reset of the histogram (which includes the creation of the histogram), the whole histogram is reset, i.e. all buckets are deleted and the sum and count of observations as well as the zero bucket are set to zero. Prometheus handles this as a normal counter reset, which means that some observations will be lost between scrapes, so resetting should happen rarely compared to the scraping interval. Additionally, frequent counter resets might lead to less efficient storage in the TSDB (see the TSDB section for details). A NativeHistogramMinResetDuration of one hour is a value that should work well in most situations." source: https://prometheus.io/docs/specs/native_histograms/#limiting-the-bucket-count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more:
RequestLatency = prometheus.NewHistogramVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, not sure how could I miss that. Thanks, just pushed a commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM label has been added. Git tree hash: 980ecec5bd1e7ae43a30399ea1e0038b9037ab8b
|
/ok-to-test |
I'll take a look |
Thanks, @sbueringer, looking forward to the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a closer look and tested it with Prometheus / Grafana / Cluster API. Looks nice!
One smaller finding
@@ -60,6 +62,9 @@ var ( | |||
Help: "Length of time per reconciliation per controller", | |||
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, | |||
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60}, | |||
NativeHistogramBucketFactor: 1.1, | |||
NativeHistogramMaxBucketNumber: 100, | |||
NativeHistogramMinResetDuration: 1 * time.Hour, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more:
RequestLatency = prometheus.NewHistogramVec( |
Thank you! /lgtm |
LGTM label has been added. Git tree hash: a6e655220d13e99e4fbfa57657f9d60347d788fd
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, krisztianfekete, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adopts Native Histograms in an opt-in and backward-compatible manner to overcome the limitations of traditional histograms.
This solves #3164