Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Client side aggregation. #139

Merged
merged 1 commit into from
Apr 21, 2020
Merged

Conversation

hush-hush
Copy link
Member

First, early try of client side aggregation.

For now this looks promising. We're testing both best case scenarion (1 metric sampled a lot) and worst case scenario (a lot of metrics sampled only once).

goos: linux
goarch: amd64
pkg: github.com/DataDog/datadog-go/statsd
BenchmarkStatsdUDPSameMetricBLocking-8                   	 5984832	       218 ns/op	         0 %_dropRate
BenchmarkStatsdUDPSameMetricDropping-8                   	13658887	        85.3 ns/op	        93.3 %_dropRate
BenchmarkStatsdUDPSameMetricBLockingAggregation-8        	13163277	        81.6 ns/op	         0 %_dropRate
BenchmarkStatsdUDPSameMetricDroppingAggregation-8        	15646678	        74.6 ns/op	         0 %_dropRate
BenchmarkStatsdUDPDifferentMetricBlocking-8              	17076426	        64.3 ns/op	         0 %_dropRate
BenchmarkStatsdUDPDifferentMetricDropping-8              	17353012	        66.9 ns/op	        83.4 %_dropRate
BenchmarkStatsdUDPDifferentMetricBlockingAggregation-8   	14357695	        81.4 ns/op	         0 %_dropRate
BenchmarkStatsdUDPDifferentMetricDroppingAggregation-8   	17213880	        68.9 ns/op	         0 %_dropRate
BenchmarkStatsdUDSSameMetricBLocking-8                   	 6019815	       214 ns/op	         0 %_dropRate
BenchmarkStatsdUDSSameMetricDropping-8                   	16993280	        81.6 ns/op	        93.4 %_dropRate
BenchmarkStatsdUDSSameMetricBLockingAggregation-8        	16397464	        85.0 ns/op	         0 %_dropRate
BenchmarkStatsdUDSSameMetricDroppingAggregation-8        	15898207	        68.9 ns/op	         0 %_dropRate
BenchmarkStatsdUDPSifferentMetricBlocking-8              	15929932	        64.1 ns/op	         0 %_dropRate
BenchmarkStatsdUDSDifferentMetricDropping-8              	15504646	        64.6 ns/op	        82.0 %_dropRate
BenchmarkStatsdUDPSifferentMetricBlockingAggregation-8   	14955110	        76.3 ns/op	         0 %_dropRate
BenchmarkStatsdUDSDifferentMetricDroppingAggregation-8   	16314648	        69.6 ns/op	         0 %_dropRate

@hush-hush hush-hush force-pushed the maxime/client-side-aggregation-poc branch from c320bd6 to c1804f4 Compare March 18, 2020 22:08
@hush-hush hush-hush force-pushed the maxime/client-side-aggregation-poc branch 3 times, most recently from 4c0ea20 to 436113a Compare March 31, 2020 22:22
Copy link

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great POC! A few nits.

statsd/aggregator.go Show resolved Hide resolved

func (a *aggregator) count(name string, value int64, tags []string, rate float64) error {
context := name + strings.Join(tags, "")
a.countsM.RLock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider thread 1 after executing the instruction at line 102 and thread 2 at line 79

Thread 1 takes the lock and run until line 105, execute a.countsM.RUnlock() and stop (before executing count.sample(value))

As lock was released, thread 2 can run until line 97.

Thread 1 execute count.sample(value). It is increment the value in the map counts (counts := a.counts) but as values were already flushed (metrics = append(metrics, c.flushUnsafe())) the increment count.sample(value) is not take into account.

Maybe this is the reason of the name flushUnsafe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should not release the read lock until we sampled the metric. This should be fine as we won't slow down the sampling.

statsd/aggregator.go Outdated Show resolved Hide resolved
@hush-hush hush-hush force-pushed the maxime/drop-metric-option branch 2 times, most recently from 6d11097 to 0260f5c Compare April 17, 2020 15:53
@hush-hush hush-hush changed the base branch from maxime/drop-metric-option to master April 17, 2020 17:31
@hush-hush hush-hush force-pushed the maxime/client-side-aggregation-poc branch from 436113a to 8256491 Compare April 17, 2020 17:58
Copy link

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

func (s *setMetric) sample(v string) {
s.Lock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As sample is used inside a block a.countsM.RLock() / a.countsM.RUnlock(), s.Lock() is not required anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is to avoid concurrent write to the map, no ?

Copy link
Contributor

@remeh remeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-written PR!

statsd/aggregator.go Outdated Show resolved Hide resolved
statsd/metrics.go Outdated Show resolved Hide resolved
statsd/metrics.go Outdated Show resolved Hide resolved
statsd/statsd.go Show resolved Hide resolved
This should reduce by a lot the number of packets (and therefore the
number of packets drop) between the client and the Agent. This should
also improve performances in hot path when sampling the same metrics.

Using fnv1a hash for the aggregator gives us a average ~12% improvement
on the aggregation benchmarks.
@hush-hush hush-hush force-pushed the maxime/client-side-aggregation-poc branch from ce91945 to 0b67271 Compare April 21, 2020 17:06
@hush-hush hush-hush merged commit 47fe936 into master Apr 21, 2020
@hush-hush hush-hush deleted the maxime/client-side-aggregation-poc branch April 21, 2020 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants