✨ Adopt native histograms#3165
Conversation
|
Welcome @krisztianfekete! |
|
Hi @krisztianfekete. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60}, | ||
| NativeHistogramBucketFactor: 1.1, | ||
| NativeHistogramMaxBucketNumber: 100, | ||
| NativeHistogramMinResetDuration: 1 * time.Hour, |
There was a problem hiding this comment.
If we do this, can we please do it for all histograms and not just one? And what was the reasoning behind choosing that NativeHistogramBucketFactor and NativeHistogramMaxBucketNumber?
There was a problem hiding this comment.
Sure, let me also add it for workqueue metrics too!
The values I've set are sensible defaults that most project starts with (us included).
Please see the following references, reasoning:
NativeHistogramBucketFactor: https://github.com/prometheus/client_golang/blob/331dfab0cc853dca0242a0d96a80184087a80c1d/prometheus/histogram.go#L405-L429NativeHistogramMaxBucketNumber: I've seen this being set to 100 usually, unless it is desired to mimic Exponential Histograms in OTel closer, in which case this can also be 160.NativeHistogramMinResetDuration: "If at least NativeHistogramMinResetDuration has passed since the last reset of the histogram (which includes the creation of the histogram), the whole histogram is reset, i.e. all buckets are deleted and the sum and count of observations as well as the zero bucket are set to zero. Prometheus handles this as a normal counter reset, which means that some observations will be lost between scrapes, so resetting should happen rarely compared to the scraping interval. Additionally, frequent counter resets might lead to less efficient storage in the TSDB (see the TSDB section for details). A NativeHistogramMinResetDuration of one hour is a value that should work well in most situations." source: https://prometheus.io/docs/specs/native_histograms/#limiting-the-bucket-count
There was a problem hiding this comment.
There's one more:
There was a problem hiding this comment.
Hah, not sure how could I miss that. Thanks, just pushed a commit!
|
LGTM label has been added. DetailsGit tree hash: 980ecec5bd1e7ae43a30399ea1e0038b9037ab8b |
|
/ok-to-test |
|
I'll take a look |
|
Thanks, @sbueringer, looking forward to the review! |
sbueringer
left a comment
There was a problem hiding this comment.
Took a closer look and tested it with Prometheus / Grafana / Cluster API. Looks nice!
One smaller finding
| 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60}, | ||
| NativeHistogramBucketFactor: 1.1, | ||
| NativeHistogramMaxBucketNumber: 100, | ||
| NativeHistogramMinResetDuration: 1 * time.Hour, |
There was a problem hiding this comment.
There's one more:
|
Thank you! /lgtm |
|
LGTM label has been added. DetailsGit tree hash: a6e655220d13e99e4fbfa57657f9d60347d788fd |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, krisztianfekete, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* adopt native histograms * add native histograms to workqueue metrics too * adopt native histograms for admission histogram
Adopts Native Histograms in an opt-in and backward-compatible manner to overcome the limitations of traditional histograms.
This solves #3164