-
Notifications
You must be signed in to change notification settings - Fork 816
Short cut distributor sample pushes. #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
distributor/distributor.go
Outdated
minSuccess: minSuccess, | ||
maxFailures: len(ingesters[i]) - minSuccess, | ||
succeeded: 0, | ||
failed: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit these two from the initialization because it's the zero value?
distributor/distributor.go
Outdated
}(hostname, samples) | ||
pushTracker := pushTracker{ | ||
samplesPending: int32(len(samples)), | ||
samplesFailed: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
distributor/distributor.go
Outdated
err := d.sendSamplesErr(ctx, ingester, sampleTrackers) | ||
|
||
// If we suceed, decrement each sample's pending count by one. If we reach | ||
// the requred number of successful puts on this sample, then decrement the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requred -> required
distributor/distributor.go
Outdated
// goroutine will write to either channel. | ||
for i := range sampleTrackers { | ||
if err != nil { | ||
if atomic.AddInt32(&sampleTrackers[i].failed, 1) > int32(sampleTrackers[i].maxFailures) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be <
rather than >
?
Looks good in general, though I'm wondering how important push latency is. The user will not see it, and we're still doing the same amount of processing work (or more) in the distributor, now with more code complexity. |
Oh, and could you rebase on master and ensure passing tests? |
…fail), return the rpc - don't wait for all of them.
89e7eaf
to
3ba05bd
Compare
Well it turns out which even a modest number of samples/s, latency is quite important, as we only push from a fixed number of shards in prometheus - so lower latency = more samples/s. We should also improve the sharding situation in prometheus (maybe have a dynamic number of shared), but in the end we will always be subject to this constraint as we want to preserver sample ordering...
Done; I'm also adding a write through test to the distributor, which will take an hour or so. |
Ok, makes sense. I'll wait for everything to be ready here then. |
👍 |
When enough samples succeed (or fail), return the rpc - don't wait for all of them.
This should massively reduce the 99th percentile latency (currently up in the 200ms), and bring it a lot closer to the average (of 40ms).