Skip to content

Commit

Permalink
Auto merge of #709 - Mark-Simulacrum:add-metric, r=Mark-Simulacrum
Browse files Browse the repository at this point in the history
Emit metrics for record_progress endpoint

Previously we were only tracking the worker time, not the endpoint. We see that there is a direct correlation with the throughput of a job and the worker time. This seems wrong to me, because as long as the worker is keeping up with the input rate, the throughput shouldn't be affected.

Note that we believe that the worker should not affect the HTTP endpoint at all - we connect these with a bounded queue and pushing into the queue is done with `try_send`, which shouldn't block (https://docs.rs/crossbeam-channel/latest/crossbeam_channel/struct.Sender.html#method.try_send) and returns an error if the queue is full. We already emit a metric if the queue is full, and that's not happening here.

The hope is that the extra metric here gives us some clue for what the problem is.

Metric graphs:

<img width="1197" alt="image" src="https://github.com/rust-lang/crater/assets/5047365/34fac874-254f-4a91-b75a-4d2d9e25aea0">

<img width="1015" alt="image" src="https://github.com/rust-lang/crater/assets/5047365/d21ea685-56ef-49bf-943b-6dbe2648f336">
  • Loading branch information
bors committed Nov 5, 2023
2 parents d4e2717 + bac1249 commit 4d35849
Showing 1 changed file with 13 additions and 3 deletions.
16 changes: 13 additions & 3 deletions src/server/routes/agent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ use http::Response;
use hyper::Body;
use std::collections::HashMap;
use std::sync::{Arc, Condvar, Mutex};
use std::time::Instant;
use warp::{self, Filter, Rejection};

#[derive(Deserialize)]
Expand Down Expand Up @@ -214,7 +215,7 @@ impl RecordProgressThread {

metrics
.crater_endpoint_time
.with_label_values(&["record_progress"])
.with_label_values(&["record_progress_worker"])
.observe(start.elapsed().as_secs_f64());
}
}));
Expand Down Expand Up @@ -298,14 +299,23 @@ fn endpoint_record_progress(
data: Arc<Data>,
_auth: AuthDetails,
) -> Fallible<Response<Body>> {
match data.record_progress_worker.queue.try_send(result) {
let start = Instant::now();

let ret = match data.record_progress_worker.queue.try_send(result) {
Ok(()) => Ok(ApiResponse::Success { result: true }.into_response()?),
Err(crossbeam_channel::TrySendError::Full(_)) => {
data.metrics.crater_bounced_record_progress.inc_by(1);
Ok(ApiResponse::<()>::SlowDown.into_response()?)
}
Err(crossbeam_channel::TrySendError::Disconnected(_)) => unreachable!(),
}
};

data.metrics
.crater_endpoint_time
.with_label_values(&["record_progress_endpoint"])
.observe(start.elapsed().as_secs_f64());

ret
}

fn endpoint_heartbeat(data: Arc<Data>, auth: AuthDetails) -> Fallible<Response<Body>> {
Expand Down

0 comments on commit 4d35849

Please sign in to comment.