Scaling registry updates

**TL;DR:** This is a problem we don’t have yet. I mostly want to record some information in case we do in the long term.

---

This comment: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193772935 explains how the CocoaPods/Specs repository gets so much traffic that GitHub rate-limits it severely, causing fetches to take a very long time or fail.

> We understand that part of the CocoaPods workflow is that its _end users_ (i.e., not just the people _contributing_ to CocoaPods/Specs) fetch regularly from GitHub

This sounds exactly like rust-lang/crates.io-index.

Rate-limiting from GitHub has not been a problem for us as far as I know, but there may be some precautions we can take to avoid it.

> Apparently, most of the initial clones are _shallow_, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the `--depth=1` option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.

I think we’re OK here since Cargo uses libgit2 which does not support shallow clones anyway.

> Finally, the layout of the repo itself doesn't help. Specifically, the `Specs` directory, which contains 16k+ subdirectories, causes some Git operations to be unexpectedly expensive, further driving up CPU usage.

Here as well we’re doing pretty good since rust-lang/crates.io-index already has two levels of directory nesting, each (roughly) with two characters from the start of crates’s names. 26^4 is 456,976; npm has 249,825 packages right now.

Another comment https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193801376 suggests:

> this new, preview API should help: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/. It's helped Homebrew dramatically reduce the number of no-op `git fetch`s which also will make things better for your users as a no-op API HTTP call is significantly faster for you (and less expensive for GitHub) than a no-op `git fetch`. 

This sounds beneficial even if we don’t hit rate-limiting. I’ve filed #2451 separately.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaling registry updates #2452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scaling registry updates #2452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions