Skip to content

Commit 5c0557e

Browse files
committed
Auto merge of #2434 - sgrif:sg-dont-page-for-running-jobs, r=jtgeibel
Don't page for long running jobs This changes our pageable event for the background queue from "a job is more than $MAX_JOB_TIME minutes old" to "a job is not currently being run and is more than $MAX_JOB_TIME minutes old", meaning we will no longer page for a single job taking longer than $MAX_JOB_TIME minutes to run. I've been getting paged a lot lately because `update_downloads` has taken longer than our threshold to complete, meaning "a job has been in the queue for too long". While I'd love to know if any other job is stuck in an infinite loop, this has never triggered for a bug like that. Any other job having that bug will clog the entire queue pretty quickly. If a job is making progress, I don't need to be paged. For `update_downloads`, really there's nothing I can do about it unless there are two instances running concurrently (in which case I can restart the worker and delete one of them while it's unlocked), but if I didn't do that it will eventually resolve itself (unless we're continuously being hammered by bots in which case I will get paged when this inevitably clogs the entire queue). The worst case scenario for `update_downloads` is that download counts are out of sync for a while. Frankly, if it's not blocking index updates, I do not need to be woken up for it. I've had to change the query slightly so the count happens on the Rust side, since we can't do `FOR UPDATE` on an aggregate query. We could do this with a subselect, but we'd need to drop to raw SQL for that which is a bit of a pain here for very little gain.
2 parents 06bfd00 + 91e3899 commit 5c0557e

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

src/bin/monitor.rs

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ fn main() -> Result<(), Error> {
2222
fn check_stalled_background_jobs(conn: &PgConnection) -> Result<(), Error> {
2323
use cargo_registry::schema::background_jobs::dsl::*;
2424
use diesel::dsl::*;
25+
use diesel::sql_types::Integer;
2526

2627
const EVENT_KEY: &str = "background_jobs";
2728

@@ -32,9 +33,12 @@ fn check_stalled_background_jobs(conn: &PgConnection) -> Result<(), Error> {
3233
.unwrap_or(15);
3334

3435
let stalled_job_count = background_jobs
36+
.select(1.into_sql::<Integer>())
3537
.filter(created_at.lt(now - max_job_time.minutes()))
36-
.count()
37-
.get_result::<i64>(conn)?;
38+
.for_update()
39+
.skip_locked()
40+
.load::<i32>(conn)?
41+
.len();
3842

3943
let event = if stalled_job_count > 0 {
4044
on_call::Event::Trigger {

0 commit comments

Comments
 (0)