Don't page for long running jobs #2434

sgrif · 2020-04-16T22:01:22Z

This changes our pageable event for the background queue from "a job is
more than $MAX_JOB_TIME minutes old" to "a job is not currently being
run and is more than $MAX_JOB_TIME minutes old", meaning we will no
longer page for a single job taking longer than $MAX_JOB_TIME minutes to
run.

I've been getting paged a lot lately because update_downloads has
taken longer than our threshold to complete, meaning "a job has been in
the queue for too long". While I'd love to know if any other job is
stuck in an infinite loop, this has never triggered for a bug like that.
Any other job having that bug will clog the entire queue pretty quickly.

If a job is making progress, I don't need to be paged. For
update_downloads, really there's nothing I can do about it unless
there are two instances running concurrently (in which case I can
restart the worker and delete one of them while it's unlocked), but if I
didn't do that it will eventually resolve itself (unless we're
continuously being hammered by bots in which case I will get paged when
this inevitably clogs the entire queue).

The worst case scenario for update_downloads is that download counts
are out of sync for a while. Frankly, if it's not blocking index
updates, I do not need to be woken up for it.

I've had to change the query slightly so the count happens on the Rust
side, since we can't do FOR UPDATE on an aggregate query. We could do
this with a subselect, but we'd need to drop to raw SQL for that which
is a bit of a pain here for very little gain.

This changes our pageable event for the background queue from "a job is more than $MAX_JOB_TIME minutes old" to "a job is not currently being run and is more than $MAX_JOB_TIME minutes old", meaning we will no longer page for a single job taking longer than $MAX_JOB_TIME minutes to run. I've been getting paged a lot lately because `update_downloads` has taken longer than our threshold to complete, meaning "a job has been in the queue for too long". While I'd love to know if any other job is stuck in an infinite loop, this has never triggered for a bug like that. Any other job having that bug will clog the entire queue pretty quickly. If a job is making progress, I don't need to be paged. For `update_downloads`, really there's nothing I can do about it unless there are two instances running concurrently (in which case I can restart the worker and delete one of them while it's unlocked), but if I didn't do that it will eventually resolve itself (unless we're continuously being hammered by bots in which case I will get paged when this inevitably clogs the entire queue). The worst case scenario for `update_downloads` is that download counts are out of sync for a while. Frankly, if it's not blocking index updates, I do not need to be woken up for it. I've had to change the query slightly so the count happens on the Rust side, since we can't do `FOR UPDATE` on an aggregate query. We could do this with a subselect, but we'd need to drop to raw SQL for that which is a bit of a pain here for very little gain.

rust-highfive · 2020-04-16T22:01:26Z

r? @jtgeibel

(rust_highfive has picked a reviewer for you, use r? to override)

jtgeibel · 2020-05-02T17:08:09Z

@bors r+

bors · 2020-05-02T17:08:10Z

📌 Commit 91e3899 has been approved by jtgeibel

bors · 2020-05-02T17:09:54Z

⌛ Testing commit 91e3899 with merge 5c0557e...

bors · 2020-05-02T17:19:15Z

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 5c0557e to master...

rust-highfive assigned jtgeibel Apr 16, 2020

rust-highfive added the S-waiting-on-review label Apr 16, 2020

bors merged commit 5c0557e into rust-lang:master May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't page for long running jobs #2434

Don't page for long running jobs #2434

Uh oh!

sgrif commented Apr 16, 2020

Uh oh!

rust-highfive commented Apr 16, 2020

Uh oh!

jtgeibel commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

Uh oh!

Don't page for long running jobs #2434

Don't page for long running jobs #2434

Uh oh!

Conversation

sgrif commented Apr 16, 2020

Uh oh!

rust-highfive commented Apr 16, 2020

Uh oh!

jtgeibel commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

bors commented May 2, 2020

Uh oh!

Uh oh!