-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the histogram bucket bridge #3937
Conversation
6277c80
to
eebc57c
Compare
35b45d8
to
d4ae324
Compare
pkg/aggregator/check_sampler.go
Outdated
|
||
// simple linear interpolation, TODO: optimize | ||
if math.IsInf(bucket.UpperBound, 1) { | ||
// Arbitrarily double the lower bucket value for interpolation over infinity bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely going to require some documentation, is there any guideline or standard practice around this in the community?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find any guideline except what is currently done in the histogram_quantile
function that requires the infinity bucket but will return the closest bucket if the quantile falls into it:
if the quantile falls into the highest bucket, the upper bound of the 2nd highest bucket is returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. That's confusing and unfortunate, but at least it's deterministic. I'll never understand why they don't just add a max metric. But anyway, for now we should mimc this I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbarciauskas I'm not sure how to emulate this behavior with the sketch, don't try to interpolate anything with the infinity bucket? (would mess up the summary
)
Or interpolate everything close/at the value of "the upper bound of the 2nd highest" ? (close to what I'm currently doing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean report the count of values in the infinity bucket as the last defined bucket value, essentially as if they were all = max. Either choice (2nd to last bucket = max or 2*2nd to last bucket = max) is synthesizing data that doesn't exist. The first one means that query results should be similar at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just changed the logic to insert everything at the lower_bound
of the infinity bucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choosing the lower_bound
makes sense to me, but this needs to be very loudly documented since max
is ~always going to be wrong in fairly surprising ways.
767da5f
to
6121adf
Compare
6121adf
to
b39d8ad
Compare
b39d8ad
to
42789bd
Compare
Name: ctx.Name, | ||
Tags: ctx.Tags, | ||
Host: ctx.Host, | ||
// Interval: TODO: investigate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbarciauskas I'm not sure how to handle this field in the context of checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do checks run at a standard interval? I think it's only important for counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbarciauskas they usually do (15 sec), but a custom check interval can be defined per check
Codecov Report
@@ Coverage Diff @@
## master #3937 +/- ##
=========================================
Coverage ? 52.11%
=========================================
Files ? 631
Lines ? 45673
Branches ? 0
=========================================
Hits ? 23801
Misses ? 20478
Partials ? 1394
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed everything except the rt_loader part. Could you ask agent-core to have a look? I'm not sure my review would be useful there. Also interested in your opinion on insertN @jbarciauskas - or someone else familiar with the sketch implem
a.Sketch.Basic.InsertN(v, n) | ||
|
||
for i := 0; i < int(n); i++ { | ||
a.Buf = append(a.Buf, agentConfig.key(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part would benefit from having bulk insert in the the sparse store
datadog-agent/pkg/quantile/store.go
Line 133 in e96af32
func (s *sparseStore) insert(c *Config, keys []Key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only a few nits here and there
--- | ||
features: | ||
- | | ||
[preview] Checks can now send histogram buckets to the agent to be sent as distribution metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this feature get enabled by default for people using wildcards in prometheus checks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it'll be disabled by default so nothing should change if you're not enabling on purpose the send_distribution_buckets
flag see DataDog/integrations-core#4321
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing feedback from @Daniel-B-Smith and I:
-The efficiency is poor and scales unintuitively (with counts), but substantial improvements are 1-2 weeks of effort at least
-There are potential accuracy improvements that will come along with that, but they will be incremental
If we're OK with those caveats, we're fine moving ahead with this and iterating on it.
What does this PR do?
Motivation
Send prometheus/openmetrics histograms from checks to submit them as distribution metrics
Additional Notes
Performance on the vanilla linear interpolation implementation:
With an interpolation granularity of 1000 using
insertN
:This is speeding up the
Summary
computation but thesparseStore
still has values inserted one by one.Integrations-core PR: DataDog/integrations-core#4321