Add the histogram bucket bridge #3937

mfpierre · 2019-07-25T17:57:47Z

What does this PR do?

Add the histogram bucket bridge
Add sketch support to the check sampler

Motivation

Send prometheus/openmetrics histograms from checks to submit them as distribution metrics

Additional Notes

Performance on the vanilla linear interpolation implementation:

BenchmarkAddBucket1-4          	 1000000	      1394 ns/op	     850 B/op	      10 allocs/op
BenchmarkAddBucket100-4        	  100000	     11483 ns/op	    1217 B/op	      12 allocs/op
BenchmarkAddBucket10000-4      	    2000	    702976 ns/op	   42537 B/op	     224 allocs/op
BenchmarkAddBucket1000000-4    	      20	  66036465 ns/op	 4174570 B/op	   21495 allocs/op
BenchmarkAddBucket10000000-4   	       2	 679800800 ns/op	41738508 B/op	  214864 allocs/op

With an interpolation granularity of 1000 using insertN:

BenchmarkAddBucket1-4          	 1000000	      1652 ns/op	     874 B/op	      11 allocs/op
BenchmarkAddBucket10-4         	 1000000	      2461 ns/op	     909 B/op	      11 allocs/op
BenchmarkAddBucket100-4        	  100000	     12185 ns/op	    1241 B/op	      13 allocs/op
BenchmarkAddBucket1000-4       	   20000	     98839 ns/op	    4997 B/op	      32 allocs/op
BenchmarkAddBucket10000-4      	    5000	    397713 ns/op	   81300 B/op	     241 allocs/op
BenchmarkAddBucket1000000-4    	      50	  25824508 ns/op	 4185665 B/op	   12011 allocs/op
BenchmarkAddBucket10000000-4   	       5	 283451953 ns/op	96373139 B/op	   21017 allocs/op

This is speeding up the Summary computation but the sparseStore still has values inserted one by one.

Integrations-core PR: DataDog/integrations-core#4321

jbarciauskas · 2019-08-14T19:47:39Z

pkg/aggregator/check_sampler.go

+
+	// simple linear interpolation, TODO: optimize
+	if math.IsInf(bucket.UpperBound, 1) {
+		// Arbitrarily double the lower bucket value for interpolation over infinity bucket


This is likely going to require some documentation, is there any guideline or standard practice around this in the community?

I wasn't able to find any guideline except what is currently done in the histogram_quantile function that requires the infinity bucket but will return the closest bucket if the quantile falls into it:

if the quantile falls into the highest bucket, the upper bound of the 2nd highest bucket is returned

ref https://github.com/prometheus/prometheus/blob/41151ca8dc069448515f48893b8631b9a3ad8df8/promql/quantile.go#L49-L70

Ah. That's confusing and unfortunate, but at least it's deterministic. I'll never understand why they don't just add a max metric. But anyway, for now we should mimc this I guess?

@jbarciauskas I'm not sure how to emulate this behavior with the sketch, don't try to interpolate anything with the infinity bucket? (would mess up the summary)
Or interpolate everything close/at the value of "the upper bound of the 2nd highest" ? (close to what I'm currently doing)

Yeah, I mean report the count of values in the infinity bucket as the last defined bucket value, essentially as if they were all = max. Either choice (2nd to last bucket = max or 2*2nd to last bucket = max) is synthesizing data that doesn't exist. The first one means that query results should be similar at least.

just changed the logic to insert everything at the lower_bound of the infinity bucket

Choosing the lower_bound makes sense to me, but this needs to be very loudly documented since max is ~always going to be wrong in fairly surprising ways.

mfpierre · 2019-08-19T12:15:24Z

pkg/aggregator/check_sampler.go

+		Name: ctx.Name,
+		Tags: ctx.Tags,
+		Host: ctx.Host,
+		// Interval: TODO: investigate


@jbarciauskas I'm not sure how to handle this field in the context of checks?

Do checks run at a standard interval? I think it's only important for counts

@jbarciauskas they usually do (15 sec), but a custom check interval can be defined per check

codecov · 2019-08-20T07:49:09Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a654ff7). Click here to learn what that means.
The diff coverage is 70.3%.

@@            Coverage Diff            @@
##             master    #3937   +/-   ##
=========================================
  Coverage          ?   52.11%           
=========================================
  Files             ?      631           
  Lines             ?    45673           
  Branches          ?        0           
=========================================
  Hits              ?    23801           
  Misses            ?    20478           
  Partials          ?     1394

Flag	Coverage Δ
#linux	`56.62% <71.62%> (?)`
#windows	`52.51% <70.3%> (?)`

Impacted Files	Coverage Δ
pkg/collector/python/init.go	`3.87% <ø> (ø)`
pkg/quantile/agent.go	`72.22% <0%> (ø)`
pkg/aggregator/mocksender/asserts.go	`75.86% <0%> (ø)`
pkg/metrics/histogram_bucket.go	`0% <0%> (ø)`
pkg/aggregator/mocksender/mocked_methods.go	`13.79% <0%> (ø)`
pkg/aggregator/mocksender/mocksender.go	`95% <100%> (ø)`
pkg/aggregator/context_resolver.go	`92.3% <100%> (ø)`
pkg/collector/python/test_aggregator.go	`100% <100%> (ø)`
pkg/aggregator/aggregator.go	`63.59% <15%> (ø)`
pkg/metrics/metric_sample.go	`17.64% <40%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a654ff7...1b5b872. Read the comment docs.

release.json

hkaj

Reviewed everything except the rt_loader part. Could you ask agent-core to have a look? I'm not sure my review would be useful there. Also interested in your opinion on insertN @jbarciauskas - or someone else familiar with the sketch implem

pkg/aggregator/check_sampler.go

pkg/aggregator/sender.go

pkg/collector/python/init.go

mfpierre · 2019-08-21T07:56:42Z

pkg/quantile/agent.go

+	a.Sketch.Basic.InsertN(v, n)
+
+	for i := 0; i < int(n); i++ {
+		a.Buf = append(a.Buf, agentConfig.key(v))


This part would benefit from having bulk insert in the the sparse store

datadog-agent/pkg/quantile/store.go

Line 133 in e96af32

func (s *sparseStore) insert(c *Config, keys []Key) {

arbll

Looks good, only a few nits here and there

pkg/aggregator/check_sampler.go

rtloader/test/aggregator/aggregator.go

arbll · 2019-08-22T12:40:11Z

releasenotes/notes/histogram-bucket-0550f34eab872c19.yaml

+---
+features:
+  - |
+    [preview] Checks can now send histogram buckets to the agent to be sent as distribution metrics.


Will this feature get enabled by default for people using wildcards in prometheus checks ?

No, it'll be disabled by default so nothing should change if you're not enabling on purpose the send_distribution_buckets flag see DataDog/integrations-core#4321

jbarciauskas

Summarizing feedback from @Daniel-B-Smith and I:

-The efficiency is poor and scales unintuitively (with counts), but substantial improvements are 1-2 weeks of effort at least
-There are potential accuracy improvements that will come along with that, but they will be incremental

If we're OK with those caveats, we're fine moving ahead with this and iterating on it.

mfpierre added this to the 6.14.0 milestone Jul 25, 2019

mfpierre added the team/containers label Jul 25, 2019

mfpierre force-pushed the mfpierre/histogram-bucket-bridge branch 3 times, most recently from 6277c80 to eebc57c Compare July 26, 2019 11:57

mfpierre force-pushed the mfpierre/histogram-bucket-bridge branch 2 times, most recently from 35b45d8 to d4ae324 Compare August 9, 2019 09:53

mfpierre added the do-not-merge/WIP label Aug 12, 2019

mfpierre mentioned this pull request Aug 14, 2019

[preview] Send histogram buckets DataDog/integrations-core#4321

Merged

6 tasks

jbarciauskas reviewed Aug 14, 2019

View reviewed changes

mfpierre force-pushed the mfpierre/histogram-bucket-bridge branch from 767da5f to 6121adf Compare August 16, 2019 15:26

mfpierre added 10 commits August 16, 2019 18:29

Add the histogram bucket bridge

4032c74

Add sender test

6472c7d

Introduce MetricSampleContext interface

b4481bf

[wip] flush sketches in check sampler

ab6b8f2

change integrations-core version

aa82cdf

Fix python init

b98591a

Add release note

7a9ceda

Store bucket deltas

021afeb

Handle arbitrarily infinity buckets

df2a290

Add benchmark

b5f802c

mfpierre force-pushed the mfpierre/histogram-bucket-bridge branch from 6121adf to b39d8ad Compare August 16, 2019 16:29

mfpierre marked this pull request as ready for review August 16, 2019 17:36

mfpierre requested review from a team as code owners August 16, 2019 17:36

mfpierre requested a review from a team August 16, 2019 17:36

mfpierre added component/aggregator component/rtloader labels Aug 16, 2019

Optimize using InsertN into the sketch

42789bd

mfpierre force-pushed the mfpierre/histogram-bucket-bridge branch from b39d8ad to 42789bd Compare August 16, 2019 17:55

Change infinity logic + add tests

709cac2

mfpierre commented Aug 19, 2019

View reviewed changes

Consider monotonic bucket parameter

1b5b872

mfpierre commented Aug 20, 2019

View reviewed changes

release.json Outdated Show resolved Hide resolved

hkaj reviewed Aug 20, 2019

View reviewed changes

pkg/aggregator/check_sampler.go Outdated Show resolved Hide resolved

pkg/aggregator/check_sampler.go Outdated Show resolved Hide resolved

pkg/aggregator/sender.go Show resolved Hide resolved

pkg/collector/python/init.go Outdated Show resolved Hide resolved

jbarciauskas requested a review from Daniel-B-Smith August 20, 2019 18:44

mfpierre commented Aug 21, 2019

View reviewed changes

Address feedback

dedcb45

arbll requested changes Aug 22, 2019

View reviewed changes

jbarciauskas reviewed Aug 22, 2019

View reviewed changes

mfpierre added 2 commits August 23, 2019 10:09

Address feedback

1e91b99

Revert release.json

2bc840b

mfpierre removed the do-not-merge/WIP label Aug 23, 2019

arbll approved these changes Aug 23, 2019

View reviewed changes

hkaj approved these changes Aug 23, 2019

View reviewed changes

mfpierre merged commit 9caf9ca into master Aug 23, 2019

mfpierre deleted the mfpierre/histogram-bucket-bridge branch August 23, 2019 13:43

AlexandreYang mentioned this pull request Oct 7, 2019

[HOLD do not merge] stats: add option for percentiles to be emitted by client for distribution metrics DataDog/opencensus-go-exporter-datadog#51

Closed

AlexandreYang mentioned this pull request Jun 9, 2020

How to Utilize Distribution Metrics in Datadog? DataDog/opencensus-go-exporter-datadog#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the histogram bucket bridge #3937

Add the histogram bucket bridge #3937

mfpierre commented Jul 25, 2019 •

edited

Loading

jbarciauskas Aug 14, 2019

mfpierre Aug 16, 2019 •

edited

Loading

jbarciauskas Aug 16, 2019

mfpierre Aug 16, 2019

jbarciauskas Aug 16, 2019

mfpierre Aug 19, 2019

Daniel-B-Smith Aug 21, 2019

mfpierre Aug 19, 2019

jbarciauskas Aug 19, 2019

mfpierre Aug 20, 2019

codecov bot commented Aug 20, 2019

hkaj left a comment

mfpierre Aug 21, 2019

arbll left a comment

arbll Aug 22, 2019

mfpierre Aug 22, 2019

jbarciauskas left a comment

Add the histogram bucket bridge #3937

Add the histogram bucket bridge #3937

Conversation

mfpierre commented Jul 25, 2019 • edited Loading

What does this PR do?

Motivation

Additional Notes

Choose a reason for hiding this comment

mfpierre Aug 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 20, 2019

Codecov Report

hkaj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbarciauskas left a comment

Choose a reason for hiding this comment

mfpierre commented Jul 25, 2019 •

edited

Loading

mfpierre Aug 16, 2019 •

edited

Loading