Skip to content

Don't page for long running jobs #2434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2020

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Apr 16, 2020

This changes our pageable event for the background queue from "a job is
more than $MAX_JOB_TIME minutes old" to "a job is not currently being
run and is more than $MAX_JOB_TIME minutes old", meaning we will no
longer page for a single job taking longer than $MAX_JOB_TIME minutes to
run.

I've been getting paged a lot lately because update_downloads has
taken longer than our threshold to complete, meaning "a job has been in
the queue for too long". While I'd love to know if any other job is
stuck in an infinite loop, this has never triggered for a bug like that.
Any other job having that bug will clog the entire queue pretty quickly.

If a job is making progress, I don't need to be paged. For
update_downloads, really there's nothing I can do about it unless
there are two instances running concurrently (in which case I can
restart the worker and delete one of them while it's unlocked), but if I
didn't do that it will eventually resolve itself (unless we're
continuously being hammered by bots in which case I will get paged when
this inevitably clogs the entire queue).

The worst case scenario for update_downloads is that download counts
are out of sync for a while. Frankly, if it's not blocking index
updates, I do not need to be woken up for it.

I've had to change the query slightly so the count happens on the Rust
side, since we can't do FOR UPDATE on an aggregate query. We could do
this with a subselect, but we'd need to drop to raw SQL for that which
is a bit of a pain here for very little gain.

This changes our pageable event for the background queue from "a job is
more than $MAX_JOB_TIME minutes old" to "a job is not currently being
run and is more than $MAX_JOB_TIME minutes old", meaning we will no
longer page for a single job taking longer than $MAX_JOB_TIME minutes to
run.

I've been getting paged a lot lately because `update_downloads` has
taken longer than our threshold to complete, meaning "a job has been in
the queue for too long". While I'd love to know if any other job is
stuck in an infinite loop, this has never triggered for a bug like that.
Any other job having that bug will clog the entire queue pretty quickly.

If a job is making progress, I don't need to be paged. For
`update_downloads`, really there's nothing I can do about it unless
there are two instances running concurrently (in which case I can
restart the worker and delete one of them while it's unlocked), but if I
didn't do that it will eventually resolve itself (unless we're
continuously being hammered by bots in which case I will get paged when
this inevitably clogs the entire queue).

The worst case scenario for `update_downloads` is that download counts
are out of sync for a while. Frankly, if it's not blocking index
updates, I do not need to be woken up for it.

I've had to change the query slightly so the count happens on the Rust
side, since we can't do `FOR UPDATE` on an aggregate query. We could do
this with a subselect, but we'd need to drop to raw SQL for that which
is a bit of a pain here for very little gain.
@rust-highfive
Copy link

r? @jtgeibel

(rust_highfive has picked a reviewer for you, use r? to override)

@jtgeibel
Copy link
Member

jtgeibel commented May 2, 2020

@bors r+

@bors
Copy link
Contributor

bors commented May 2, 2020

📌 Commit 91e3899 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented May 2, 2020

⌛ Testing commit 91e3899 with merge 5c0557e...

@bors
Copy link
Contributor

bors commented May 2, 2020

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 5c0557e to master...

@bors bors merged commit 5c0557e into rust-lang:master May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants