Description
TL;DR: This is a problem we don’t have yet. I mostly want to record some information in case we do in the long term.
This comment: CocoaPods/CocoaPods#4989 (comment) explains how the CocoaPods/Specs repository gets so much traffic that GitHub rate-limits it severely, causing fetches to take a very long time or fail.
We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub
This sounds exactly like rust-lang/crates.io-index.
Rate-limiting from GitHub has not been a problem for us as far as I know, but there may be some precautions we can take to avoid it.
Apparently, most of the initial clones are shallow, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the
--depth=1
option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.
I think we’re OK here since Cargo uses libgit2 which does not support shallow clones anyway.
Finally, the layout of the repo itself doesn't help. Specifically, the
Specs
directory, which contains 16k+ subdirectories, causes some Git operations to be unexpectedly expensive, further driving up CPU usage.
Here as well we’re doing pretty good since rust-lang/crates.io-index already has two levels of directory nesting, each (roughly) with two characters from the start of crates’s names. 26^4 is 456,976; npm has 249,825 packages right now.
Another comment CocoaPods/CocoaPods#4989 (comment) suggests:
this new, preview API should help: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/. It's helped Homebrew dramatically reduce the number of no-op
git fetch
s which also will make things better for your users as a no-op API HTTP call is significantly faster for you (and less expensive for GitHub) than a no-opgit fetch
.
This sounds beneficial even if we don’t hit rate-limiting. I’ve filed #2451 separately.