-
Notifications
You must be signed in to change notification settings - Fork 644
Reduce database writes for the download endpoint #3413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Opened the PR early to allow the review to start ahead of time, but this PR is lacking tests (which we should add before deploying the PR). |
counted_versions, | ||
counted_downloads, | ||
pending_downloads, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, is there a reason why we're using println!()
everywhere instead of the log
or tracing
macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is really a reason, but I don't see value in switching to another library right now.
d6a4534
to
0c6ec13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me overall
0c6ec13
to
a13b94e
Compare
The PR has been deployed to staging.crates.io and it seems to correctly persist download counts! There are two things that are left to do:
As we're getting a lot of pages about this I don't think it's critical we address those two issues before deploying to production, but we should make sure we land followup PRs for it. |
Looks great to me! On my local setup, this improved the endpoint from 876 rps to 14,068 rps! (Though I don't have many versions, so this is mostly just testing atomics, and not any sort of contention on the @bors r+ |
📌 Commit f646e45 has been approved by |
☀️ Test successful - checks-actions |
Prevent transaction deadlocks in DownloadsCounter While analyzing the behavior of `DownloadsCounter` (introduced in #3413) in production, we noticed that there were a few cases of transaction deadlocks tonight in the background thread persisting the download counts. I did some investigation on the cause of this and came up with this fix. As [recommended by the PostgreSQL docs](https://www.postgresql.org/docs/11/explicit-locking.html#LOCKING-DEADLOCKS), to avoid deadlocks when multiple transactions affect multiple rows in a table, the affected keys needs to be modified in a consistent order across the transactions. We implement that by sorting by `version_id` the items we're going to insert. <details><summary>Local reproduction instructions</summary> The exact error we saw in production can be reproduced by opening two instances of `psql` locally and running those two set of queries in each `psql` (without running `COMMIT`): ```sql -- First instance of psql BEGIN; INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; -- Second instance of psql BEGIN; INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; ``` Then, once the commands are sent to both instances of `psql`, sending these other commands will cause the deadlock: ```sql -- First instance of psql INSERT INTO version_downloads (version_id) VALUES (2) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; -- Second instance of psql INSERT INTO version_downloads (version_id) VALUES (1) ON CONFLICT (version_id, date) DO UPDATE SET downloads = excluded.downloads + 1; ``` Instead, if both transactions try to insert first `1` and then `2` the first `psql` will work fine, and the second `psql` will wait for the first `psql` to `COMMIT` before applying the changes. That's the behavior we want and it's the one this PR implements. </details> r? `@jtgeibel`
crates.io receives a lot of download requests, and we're reaching a point where executing a write query to PostgreSQL for each request is causing operational issues. Over the past few weeks we've been getting regular pages due to download requests being slow to execute or outright fail for a couple of seconds, due to the database being overwhelmed.
This PR avoids that problem by collecting download counts in memory and periodically persisting them into the database, hopefully reducing the write load coming to our primary database thanks to the new
downloads_counter
module.The implementation is designed to reduce the amount of locking needed as much as possible to prevent requests from being unnecessarily delayed. Instead of storing the counters in a
RwLock<HashMap<i32, AtomicUsize>>
, which would require locking the whole map when a new version needs to be tracked, this uses thedashmap
crate.DashMap
offers an API similar to the standard library'sHashMap
, but stores the items innum_cpus()*4
individually lockedRwLock<HashMap<K, V>>
(called "shards") based on the key's hash. Splitting the items in shards allows to reduce how much of theDashMap
is locked at the same time, with concurrent download requests locking each other only inside a single shard.Persisting download counts also takes advantage of sharding: to avoid "stopping the world" when the background thread starts persisting the counters only one shard is persisted at the time. The current configuration persists all shards over the span of one minute, and that's configurable at runtime with the
DOWNLOADS_PERSIST_INTERVAL_MS
environment variable.To avoid losing downloads the code also persists all the shards at the same time when the server gracefully shuts down. There is still the possibility of losing download counts if the server is killed without a chance of gracefully shutting down (for example due to platform issues), but the possibility of losing less a minute of download counts is far less important than reducing database load.
r? @jtgeibel