Skip to content

Short cut distributor sample pushes. #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 7, 2017
Merged

Short cut distributor sample pushes. #272

merged 3 commits into from
Feb 7, 2017

Conversation

tomwilkie
Copy link
Contributor

When enough samples succeed (or fail), return the rpc - don't wait for all of them.

This should massively reduce the 99th percentile latency (currently up in the 200ms), and bring it a lot closer to the average (of 40ms).

minSuccess: minSuccess,
maxFailures: len(ingesters[i]) - minSuccess,
succeeded: 0,
failed: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit these two from the initialization because it's the zero value?

}(hostname, samples)
pushTracker := pushTracker{
samplesPending: int32(len(samples)),
samplesFailed: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

err := d.sendSamplesErr(ctx, ingester, sampleTrackers)

// If we suceed, decrement each sample's pending count by one. If we reach
// the requred number of successful puts on this sample, then decrement the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requred -> required

// goroutine will write to either channel.
for i := range sampleTrackers {
if err != nil {
if atomic.AddInt32(&sampleTrackers[i].failed, 1) > int32(sampleTrackers[i].maxFailures) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be < rather than >?

@juliusv
Copy link
Contributor

juliusv commented Feb 7, 2017

Looks good in general, though I'm wondering how important push latency is. The user will not see it, and we're still doing the same amount of processing work (or more) in the distributor, now with more code complexity.

@juliusv
Copy link
Contributor

juliusv commented Feb 7, 2017

Oh, and could you rebase on master and ensure passing tests?

@tomwilkie
Copy link
Contributor Author

Looks good in general, though I'm wondering how important push latency is. The user will not see it, and we're still doing the same amount of processing work (or more) in the distributor, now with more code complexity.

Well it turns out which even a modest number of samples/s, latency is quite important, as we only push from a fixed number of shards in prometheus - so lower latency = more samples/s. We should also improve the sharding situation in prometheus (maybe have a dynamic number of shared), but in the end we will always be subject to this constraint as we want to preserver sample ordering...

Oh, and could you rebase on master and ensure passing tests?

Done; I'm also adding a write through test to the distributor, which will take an hour or so.

@juliusv
Copy link
Contributor

juliusv commented Feb 7, 2017

Ok, makes sense. I'll wait for everything to be ready here then.

@juliusv
Copy link
Contributor

juliusv commented Feb 7, 2017

👍

@juliusv juliusv merged commit 70ee874 into master Feb 7, 2017
@juliusv juliusv deleted the short-cut-push branch February 7, 2017 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants