-
Notifications
You must be signed in to change notification settings - Fork 4.6k
streamer send destination metrics for repair, gossip #21564
Conversation
streamer/src/streamer.rs
Outdated
@@ -171,6 +279,7 @@ pub fn responder( | |||
last_print = now; | |||
errors = 0; | |||
} | |||
stats.maybe_submit(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a datapoint already using the name
. I'm not sure if it conflicts, seems like it should be ok though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's using name
to log "errors"
in the responder
loop. There shouldn't be a conflict.
Codecov Report
@@ Coverage Diff @@
## master #21564 +/- ##
========================================
Coverage 81.3% 81.3%
========================================
Files 516 517 +1
Lines 144185 144278 +93
========================================
+ Hits 117237 117346 +109
+ Misses 26948 26932 -16 |
streamer/src/streamer.rs
Outdated
const SUBMIT_CADENCE: Duration = Duration::from_secs(10); | ||
const MAX_REPORT_ENTRIES: usize = 5; | ||
let elapsed = self.since.as_ref().map(Instant::elapsed); | ||
if elapsed.map(|e| e < SUBMIT_CADENCE).unwrap_or_default() { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably also needs to submit stats if for example, host_map.len() > 1000
, or something, just so that the hashmap does not grow too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. just wondering if the added overhead is tolerable
), | ||
( | ||
"streamer-send-host_bytes_10pct", | ||
hist.percentile(10.0).unwrap_or_default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if these hist.percentile
calculations are efficient enough not too cause too much overhead/stall here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's concerning. For a histogram with 1000 datapoints I'm seeing ~600us for all the histogram samples taken in datapoint_info
(cumulative).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that mean for the streamer throughput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reporting is run on a 10s interval, but I could move this up a level out of the streamer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated changes with a new reporting thread to minimize overhead in the streamer thread. Thoughts?
streamer/src/streamer.rs
Outdated
let ent = self | ||
.host_map | ||
.entry(pkt.meta.addr) | ||
.or_insert_with(SendStats::default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
lgtm, plz confirm with Stephan that there is no concern re overhead added to streamer |
Collected some measurements: @sakridge thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, lgtm
Problem
There may be unnecessary outgoing repair traffic. Determine distribution of target hosts for nodes with high outgoing packet count.
Summary of Changes
Track outgoing packet destination metrics from streamer::recv_send. Tracks repair and gossip. Records target address distribution.
Fixes #