Skip to content

Prevent transaction deadlocks in DownloadsCounter #3414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2021

Conversation

pietroalbini
Copy link
Member

While analyzing the behavior of DownloadsCounter (introduced in #3413) in production, we noticed that there were a few cases of transaction deadlocks tonight in the background thread persisting the download counts.

I did some investigation on the cause of this and came up with this fix. As recommended by the PostgreSQL docs, to avoid deadlocks when multiple transactions affect multiple rows in a table, the affected keys needs to be modified in a consistent order across the transactions. We implement that by sorting by version_id the items we're going to insert.

Local reproduction instructions

The exact error we saw in production can be reproduced by opening two instances of psql locally and running those two set of queries in each psql (without running COMMIT):

-- First instance of psql
BEGIN;
INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1;

-- Second instance of psql
BEGIN;
INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1;

Then, once the commands are sent to both instances of psql, sending these other commands will cause the deadlock:

-- First instance of psql
INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1;

-- Second instance of psql
INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1;

Instead, if both transactions try to insert first 1 and then 2 the first psql will work fine, and the second psql will wait for the first psql to COMMIT before applying the changes. That's the behavior we want and it's the one this PR implements.

r? @jtgeibel

@jtgeibel
Copy link
Member

Awesome solution, thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Mar 16, 2021

📌 Commit 426c635 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Mar 16, 2021

⌛ Testing commit 426c635 with merge 86ff7ad...

@Turbo87
Copy link
Member

Turbo87 commented Mar 16, 2021

maybe in addition to that we could also add a small randomized delay to the updates, so that the threads don't try to write at the exact same time?

@bors
Copy link
Contributor

bors commented Mar 16, 2021

☀️ Test successful - checks-actions
Approved by: jtgeibel
Pushing 86ff7ad to master...

@bors bors merged commit 86ff7ad into rust-lang:master Mar 16, 2021
@pietroalbini pietroalbini deleted the deadlock branch March 16, 2021 15:07
@pietroalbini
Copy link
Member Author

@Turbo87 adding a delay is not needed after this PR, as the deadlock shouldn't happen anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants