Skip to content

✨ Adopt native histograms #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

krisztianfekete
Copy link
Contributor

@krisztianfekete krisztianfekete commented Mar 18, 2025

Adopts Native Histograms in an opt-in and backward-compatible manner to overcome the limitations of traditional histograms.

This solves #3164

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 18, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 18, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @krisztianfekete!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @krisztianfekete. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 18, 2025
@@ -60,6 +62,9 @@ var (
Help: "Length of time per reconciliation per controller",
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60},
NativeHistogramBucketFactor: 1.1,
NativeHistogramMaxBucketNumber: 100,
NativeHistogramMinResetDuration: 1 * time.Hour,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, can we please do it for all histograms and not just one? And what was the reasoning behind choosing that NativeHistogramBucketFactor and NativeHistogramMaxBucketNumber?

Copy link
Contributor Author

@krisztianfekete krisztianfekete Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me also add it for workqueue metrics too!
The values I've set are sensible defaults that most project starts with (us included).

Please see the following references, reasoning:

  • NativeHistogramBucketFactor: https://github.com/prometheus/client_golang/blob/331dfab0cc853dca0242a0d96a80184087a80c1d/prometheus/histogram.go#L405-L429
  • NativeHistogramMaxBucketNumber: I've seen this being set to 100 usually, unless it is desired to mimic Exponential Histograms in OTel closer, in which case this can also be 160.
  • NativeHistogramMinResetDuration: "If at least NativeHistogramMinResetDuration has passed since the last reset of the histogram (which includes the creation of the histogram), the whole histogram is reset, i.e. all buckets are deleted and the sum and count of observations as well as the zero bucket are set to zero. Prometheus handles this as a normal counter reset, which means that some observations will be lost between scrapes, so resetting should happen rarely compared to the scraping interval. Additionally, frequent counter resets might lead to less efficient storage in the TSDB (see the TSDB section for details). A NativeHistogramMinResetDuration of one hour is a value that should work well in most situations." source: https://prometheus.io/docs/specs/native_histograms/#limiting-the-bucket-count

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one more:

RequestLatency = prometheus.NewHistogramVec(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, not sure how could I miss that. Thanks, just pushed a commit!

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 19, 2025
Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

Thanks, makes sense to me.

@sbueringer any concerns here?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 980ecec5bd1e7ae43a30399ea1e0038b9037ab8b

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2025
@alvaroaleman
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 21, 2025
@sbueringer
Copy link
Member

I'll take a look

@krisztianfekete
Copy link
Contributor Author

Thanks, @sbueringer, looking forward to the review!

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look and tested it with Prometheus / Grafana / Cluster API. Looks nice!

One smaller finding

@@ -60,6 +62,9 @@ var (
Help: "Length of time per reconciliation per controller",
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60},
NativeHistogramBucketFactor: 1.1,
NativeHistogramMaxBucketNumber: 100,
NativeHistogramMinResetDuration: 1 * time.Hour,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one more:

RequestLatency = prometheus.NewHistogramVec(

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 21, 2025
@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 21, 2025
@sbueringer
Copy link
Member

Thank you!

/lgtm
/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a6e655220d13e99e4fbfa57657f9d60347d788fd

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, krisztianfekete, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 53c0518 into kubernetes-sigs:main Mar 21, 2025
11 checks passed
@krisztianfekete krisztianfekete deleted the adopt-native-histograms branch March 21, 2025 15:48
@alvaroaleman alvaroaleman linked an issue Mar 22, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt Native Histograms
4 participants