Reduce database writes for the download endpoint #3413

pietroalbini · 2021-03-15T14:24:36Z

crates.io receives a lot of download requests, and we're reaching a point where executing a write query to PostgreSQL for each request is causing operational issues. Over the past few weeks we've been getting regular pages due to download requests being slow to execute or outright fail for a couple of seconds, due to the database being overwhelmed.

This PR avoids that problem by collecting download counts in memory and periodically persisting them into the database, hopefully reducing the write load coming to our primary database thanks to the new downloads_counter module.

The implementation is designed to reduce the amount of locking needed as much as possible to prevent requests from being unnecessarily delayed. Instead of storing the counters in a RwLock<HashMap<i32, AtomicUsize>>, which would require locking the whole map when a new version needs to be tracked, this uses the dashmap crate.

DashMap offers an API similar to the standard library's HashMap, but stores the items in num_cpus()*4 individually locked RwLock<HashMap<K, V>> (called "shards") based on the key's hash. Splitting the items in shards allows to reduce how much of the DashMap is locked at the same time, with concurrent download requests locking each other only inside a single shard.

Persisting download counts also takes advantage of sharding: to avoid "stopping the world" when the background thread starts persisting the counters only one shard is persisted at the time. The current configuration persists all shards over the span of one minute, and that's configurable at runtime with the DOWNLOADS_PERSIST_INTERVAL_MS environment variable.

To avoid losing downloads the code also persists all the shards at the same time when the server gracefully shuts down. There is still the possibility of losing download counts if the server is killed without a chance of gracefully shutting down (for example due to platform issues), but the possibility of losing less a minute of download counts is far less important than reducing database load.

r? @jtgeibel

pietroalbini · 2021-03-15T14:25:07Z

Opened the PR early to allow the review to start ahead of time, but this PR is lacking tests (which we should add before deploying the PR).

Cargo.toml

Turbo87 · 2021-03-15T14:37:37Z

src/downloads_counter.rs

+            counted_versions,
+            counted_downloads,
+            pending_downloads,
+        );


out of curiosity, is there a reason why we're using println!() everywhere instead of the log or tracing macros?

I don't think there is really a reason, but I don't see value in switching to another library right now.

src/downloads_counter.rs

jtgeibel

Looks great to me overall

src/controllers/version/downloads.rs

src/downloads_counter.rs

pietroalbini · 2021-03-15T20:18:39Z

The PR has been deployed to staging.crates.io and it seems to correctly persist download counts!

There are two things that are left to do:

Add automated tests for DownloadsCounter itself: it's now generally tested by the download test, but it'd be better to add in-depth tests for it as it's going to be one of the system in the critical path for download requests.
Ensure pending downloads are persisted when the application is rebooted: this works locally and the code should be implemented correctly, but a brief test on staging shows that it might not work on Heroku.

As we're getting a lot of pages about this I don't think it's critical we address those two issues before deploying to production, but we should make sure we land followup PRs for it.

jtgeibel · 2021-03-16T00:05:54Z

Looks great to me! On my local setup, this improved the endpoint from 876 rps to 14,068 rps! (Though I don't have many versions, so this is mostly just testing atomics, and not any sort of contention on the DashMap.)

@bors r+

bors · 2021-03-16T00:05:56Z

📌 Commit f646e45 has been approved by jtgeibel

bors · 2021-03-16T00:06:02Z

⌛ Testing commit f646e45 with merge bc58bb7...

bors · 2021-03-16T00:14:06Z

☀️ Test successful - checks-actions
Approved by: jtgeibel
Pushing bc58bb7 to master...

Prevent transaction deadlocks in DownloadsCounter While analyzing the behavior of `DownloadsCounter` (introduced in #3413) in production, we noticed that there were a few cases of transaction deadlocks tonight in the background thread persisting the download counts. I did some investigation on the cause of this and came up with this fix. As [recommended by the PostgreSQL docs](https://www.postgresql.org/docs/11/explicit-locking.html#LOCKING-DEADLOCKS), to avoid deadlocks when multiple transactions affect multiple rows in a table, the affected keys needs to be modified in a consistent order across the transactions. We implement that by sorting by `version_id` the items we're going to insert. <details><summary>Local reproduction instructions</summary> The exact error we saw in production can be reproduced by opening two instances of `psql` locally and running those two set of queries in each `psql` (without running `COMMIT`): ```sql -- First instance of psql BEGIN; INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; -- Second instance of psql BEGIN; INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; ``` Then, once the commands are sent to both instances of `psql`, sending these other commands will cause the deadlock: ```sql -- First instance of psql INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; -- Second instance of psql INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; ``` Instead, if both transactions try to insert first `1` and then `2` the first `psql` will work fine, and the second `psql` will wait for the first `psql` to `COMMIT` before applying the changes. That's the behavior we want and it's the one this PR implements. </details> r? `@jtgeibel`

pietroalbini · 2021-03-16T20:13:57Z

Opened #3415 and #3416 to handle the outstanding work on this.

rust-highfive assigned jtgeibel Mar 15, 2021

rust-highfive added the S-waiting-on-review label Mar 15, 2021

Turbo87 reviewed Mar 15, 2021

View reviewed changes

pietroalbini force-pushed the downloads-counter branch from d6a4534 to 0c6ec13 Compare March 15, 2021 15:06

jtgeibel reviewed Mar 15, 2021

View reviewed changes

src/controllers/version/downloads.rs Show resolved Hide resolved

src/controllers/version/downloads.rs Show resolved Hide resolved

src/downloads_counter.rs Show resolved Hide resolved

pietroalbini added 3 commits March 15, 2021 20:47

downloads_counter: batch download count writes to the db

38175fd

server: avoid shadowing the app

f0da6cd

downloads_counter: persist all counters on shutdown

a13b94e

pietroalbini force-pushed the downloads-counter branch from 0c6ec13 to a13b94e Compare March 15, 2021 19:48

tests: update download test to work with downloads_counter

f646e45

bors merged commit bc58bb7 into rust-lang:master Mar 16, 2021

pietroalbini deleted the downloads-counter branch March 16, 2021 08:47

pietroalbini mentioned this pull request Mar 16, 2021

Prevent transaction deadlocks in DownloadsCounter #3414

Merged

This was referenced Mar 16, 2021

Handle graceful shutdown on Heroku #3415

Closed

Add more tests on DownloadsCounter #3416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce database writes for the download endpoint #3413

Reduce database writes for the download endpoint #3413

Uh oh!

pietroalbini commented Mar 15, 2021

Uh oh!

pietroalbini commented Mar 15, 2021

Uh oh!

Uh oh!

Turbo87 Mar 15, 2021

Uh oh!

pietroalbini Mar 15, 2021

Uh oh!

Uh oh!

Uh oh!

jtgeibel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pietroalbini commented Mar 15, 2021

Uh oh!

jtgeibel commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

pietroalbini commented Mar 16, 2021

Uh oh!

Uh oh!

Reduce database writes for the download endpoint #3413

Reduce database writes for the download endpoint #3413

Uh oh!

Conversation

pietroalbini commented Mar 15, 2021

Uh oh!

pietroalbini commented Mar 15, 2021

Uh oh!

Uh oh!

Turbo87 Mar 15, 2021

Choose a reason for hiding this comment

Uh oh!

pietroalbini Mar 15, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jtgeibel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pietroalbini commented Mar 15, 2021

Uh oh!

jtgeibel commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

bors commented Mar 16, 2021

Uh oh!

pietroalbini commented Mar 16, 2021

Uh oh!

Uh oh!