Short cut distributor sample pushes. #272

tomwilkie · 2017-02-06T20:32:16Z

When enough samples succeed (or fail), return the rpc - don't wait for all of them.

This should massively reduce the 99th percentile latency (currently up in the 200ms), and bring it a lot closer to the average (of 40ms).

juliusv · 2017-02-07T11:07:03Z

distributor/distributor.go

+			minSuccess:  minSuccess,
+			maxFailures: len(ingesters[i]) - minSuccess,
+			succeeded:   0,
+			failed:      0,


Omit these two from the initialization because it's the zero value?

juliusv · 2017-02-07T11:09:50Z

distributor/distributor.go

-		}(hostname, samples)
+	pushTracker := pushTracker{
+		samplesPending: int32(len(samples)),
+		samplesFailed:  0,


juliusv · 2017-02-07T11:11:25Z

distributor/distributor.go

+	err := d.sendSamplesErr(ctx, ingester, sampleTrackers)
+
+	// If we suceed, decrement each sample's pending count by one.  If we reach
+	// the requred number of successful puts on this sample, then decrement the


requred -> required

juliusv · 2017-02-07T11:27:30Z

distributor/distributor.go

+	// goroutine will write to either channel.
+	for i := range sampleTrackers {
+		if err != nil {
+			if atomic.AddInt32(&sampleTrackers[i].failed, 1) > int32(sampleTrackers[i].maxFailures) {


Shouldn't this be < rather than >?

juliusv · 2017-02-07T11:34:33Z

Looks good in general, though I'm wondering how important push latency is. The user will not see it, and we're still doing the same amount of processing work (or more) in the distributor, now with more code complexity.

juliusv · 2017-02-07T11:35:44Z

Oh, and could you rebase on master and ensure passing tests?

…fail), return the rpc - don't wait for all of them.

tomwilkie · 2017-02-07T11:45:07Z

Looks good in general, though I'm wondering how important push latency is. The user will not see it, and we're still doing the same amount of processing work (or more) in the distributor, now with more code complexity.

Well it turns out which even a modest number of samples/s, latency is quite important, as we only push from a fixed number of shards in prometheus - so lower latency = more samples/s. We should also improve the sharding situation in prometheus (maybe have a dynamic number of shared), but in the end we will always be subject to this constraint as we want to preserver sample ordering...

Oh, and could you rebase on master and ensure passing tests?

Done; I'm also adding a write through test to the distributor, which will take an hour or so.

juliusv · 2017-02-07T12:00:15Z

Ok, makes sense. I'll wait for everything to be ready here then.

juliusv · 2017-02-07T12:49:13Z

👍

tomwilkie mentioned this pull request Feb 7, 2017

Implement per-user series limits #273

Merged

juliusv reviewed Feb 7, 2017

View reviewed changes

tomwilkie added 2 commits February 7, 2017 11:42

Short put distributor sample pushes; when enough samples succeed (or …

a58ef5f

…fail), return the rpc - don't wait for all of them.

Review feedback and start of a test

3ba05bd

tomwilkie force-pushed the short-cut-push branch from 89e7eaf to 3ba05bd Compare February 7, 2017 11:42

Finish the basic distributor tests

98dbdad

juliusv merged commit 70ee874 into master Feb 7, 2017

juliusv deleted the short-cut-push branch February 7, 2017 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Short cut distributor sample pushes. #272

Short cut distributor sample pushes. #272

Uh oh!

tomwilkie commented Feb 6, 2017

Uh oh!

juliusv Feb 7, 2017

Uh oh!

juliusv Feb 7, 2017

Uh oh!

juliusv Feb 7, 2017

Uh oh!

juliusv Feb 7, 2017

Uh oh!

juliusv commented Feb 7, 2017 •

edited

Loading

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

tomwilkie commented Feb 7, 2017

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

Uh oh!

Short cut distributor sample pushes. #272

Short cut distributor sample pushes. #272

Uh oh!

Conversation

tomwilkie commented Feb 6, 2017

Uh oh!

juliusv Feb 7, 2017

Choose a reason for hiding this comment

Uh oh!

juliusv Feb 7, 2017

Choose a reason for hiding this comment

Uh oh!

juliusv Feb 7, 2017

Choose a reason for hiding this comment

Uh oh!

juliusv Feb 7, 2017

Choose a reason for hiding this comment

Uh oh!

juliusv commented Feb 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

tomwilkie commented Feb 7, 2017

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

juliusv commented Feb 7, 2017

Uh oh!

Uh oh!

juliusv commented Feb 7, 2017 •

edited

Loading