Description
The topic of "offline" Cargo has come up quite a few times historically and with the advent this reddit post it got the @rust-lang/cargo team thinking about this problem again and how we might solve it. The discussion got a little sprawled but this issue is intended to be the distillation of the core feature required to allow Cargo to work easier with offline-like situations.
As some background, this issue is intended to be useful in situations such as when you're on an airplane, subway, or tooling situation where you don't want to do extraneous network requests. In these situations Cargo's default behavior, updating the crates.io index, can often fail. Furthermore why Cargo updates the index is often subtle and opaque, causing quite a bit of frustration when users don't expect network traffic to happen and then it does!
An example situation that we'd like to enable is when you work with Rust on a day-to-day basis, perhaps with a good number of Cargo projects. In this case you've got a global crate cache in $HOME/.cargo/registry
likely with a relatively-up-to-date index and a semi-populated crate cache. This means that if you were to get on a plane and then want to start a similar project, namely you'll share dependencies with crates you've previously worked on, in theory all the data is there for Cargo to consume. Today, however, any project missing a lock file will attempt to update the registry, failing in "airplane" situations.
The key insight behind this issue is that this flag will change Cargo's crate graph resolution behavior. Namely Cargo will generate a different Cargo.lock
depending on whether this flag is passed or not. The purpose of this is to ensure that Cargo's default behavior today (which is desirable in many situations) doesn't get affected too much with this situation.
With all that in mind I believe the steps for implementing this would look similar to as followed (although perhaps not exhaustively):
- First, you'll add a new unstable feature to Cargo. That's done around here and you'll probaby want to call it something like
airplane
. This means that your flag will be activated with something likecargo build -Z airplane
- You'll be accessing the list of unstable features through
Config::cli_unstable
. TheConfig
structure is ubiquitous throughout Cargo and is in general intended to house CLI configuration or other global concerns in Cargo. - Airplane mode will sort of enable "frozen mode" which is where Cargo is disallowed from talking to the internet. We want to be sure that if Cargo accidentally tries to hit the internet a loud warning happens, and otherwise the rest of the support below will be to prevent Cargo from trying to hit the network. To do this you'll want to update the
network_allowed
method. - Most of the rest of the code will go in
sources/registry
. This module contains all the support necessary for using a registry (crates.io) as a remote (aka you'll download things).
At this point is where the work will likely start. The general idea of how this will be implemented is that the behavior of the Registry
"source" (aka an implementation of the Source
trait) will differ depending on whether -Z airplane
is passed. Ideally this flag isn't too too intrusive throughout Cargo so we'll ideally want a pretty localized implementation.
The first part that we'll tackle is the update_index
method. This will want to immediately return Ok
if the airplane mode is activated. This will prevent an index update from happening in airplane mode, even when Cargo would otherwise request it (for example a lock file is missing or a dependency was added).
Next up Cargo will need to change its view of the index in airplane mode. We know that we won't be able to download any packages as the network isn't available, so we need to ensure that Cargo's crate graph resolver never asks to download something. The way the crate graph resolver works is through the Registry
trait, mostly the query
method. This method takes a Dependency
(basically a name and semver requirement) and then invokes the callback with all possible "summaries" (aka packages) that the source has.
When we're in airplane mode the number of possible summaries is far less than when we're not in airplane mode (we can't download things!). This means that we're going to want to filter the list of summaries that the crates.io registry is reported to have to be just those we've downloaded to our local computer. To do this you'll probably want to update the load_summaries
method.
In load_summaries
we'll query the underlying index implementation (the remote.rs
modified before) for all possible versions of a crate. The lines
iterator in this block will be an iterator over each line of a file in the index (browsable at https://github.com/rust-lang/crates.io-index). Each line in this index is parsed and then pushed onto a local list of summaries. What we'll want to do is apply another filter on top of this. If the summary isn't downloaded to the local computer then we'll want to skip it.
In other words, the index will report, for example, that we could use libc 0.1.4 or libc 0.2.0. If we only have libc 0.2.0 then we'll want to report back that only libc 0.2.0 is available, whereas if the airplane flag were not passed we'd report both 0.1.4 and 0.2.0 as being available. To do this you'll probably want to add a method to the RegistryData
trait like is_crate_downloaded(&self, id: &PackageId)
and the function would look something like this, testing if the .crate
file exists.
At this point the airplane flag should (a) avoid updating the index even if asked and (b) ensure that we never ask to download a crate that's not already downloaded. At this point I believe the feature should be effectively done! You should be able to write some tests, play around with it locally, etc, and see it all working.
There are, however, a number of extensions that will be required for stabilization, so I'll write those down as well:
Git repositories
In addition to crates.io we'll want to handle git repositories and the airplane flag as well. This one may be a bit easier where basically what we want to do is to ensure Cargo is guided towards not asking for a network update by altering the behavior of a git source.
Git repositories in Cargo are modeled in two locations. Each git source has a "database" which is a bare git checkout in ~/.cargo/git/db
. This database is basically just a store of all fetched objects from the remote. Checkouts then happen at ~/.cargo/git/checkouts
. Each checkout is permanently cached and looks like ~/.cargo/git/checkouts/$name-$hash/$sha
.
With that in mind airplane mode for git repositories would at a high level just avoid updating the database and then otherwise the checkout would be cloned from the database as usual. I think that most of this will just fall out of updating this if branch.
Note that right now any git repository with submodules won't work. We currently don't clone submodules from the global database but instead re-clone each submodule from the network on all checkouts. Once we fix that bug (fetch all submodules to the database, then clone from there onto the disk) then it should also "just work"!
Recommend the "airplane flag"
If you're on an airplane and are unaware of the airplane flag then it would be quite nice to teach you about it! This means that when the resolution process fails with something that looks like a network error we probably want to tweak the error message with a "did you mean" style hint. The idea here is that if you're on a plane and type cargo build
then Cargo should ideally say "you should try using -Z airplane
".
I think the way we'll probably want to do this is to test for spurious network errors whenever we update a source. If it looks like a spurious network error going out then we can probably attach on some context saying "this may work if you instead pass -Z airplane
" or something like that.
You probably want to test this out by disconnecting your network and seeing what the error looks like.
Resolution errors
One of the primary failure modes of the "airplane flag" is that you added a dependency which wasn't previously cached or otherwise the local state of the index/crate cache isn't able to build a crate. This may happen because we're not updating the index (new crates/versions aren't available) or because we're filtering the return value of load_summaries
on the registries (not all entries in the on-disk index would have been downloaded).
In any case we want to make sure that intentional or accidental use of the -Z airplane
flag doesn't cause too obscure errors. Right now Cargo has pretty bad crate graph resolution errors, unfortunately.
The failure mode here is likely to come out of resolution, not querying for crates. This means that the error is generated in this module which is one of the gnarliest modules in Cargo. You may want to skim it, but I think the main location to modify is this one which is the main source for generating resolution errors.
The error here should basically say something along the lines of "crate resolution failed, we see you're passing -Z airplane
and it may be failing because of that"
Failing cargo update
Similarly to weird resolution errors a cargo update
is basically guaranteed to not work. We should bail out of updating as early as we can if you invoke cargo update -Z airplane
.
Populating the global cache
Right now Cargo has no explicit way of populating the global cache. That means this is currently only catering to the use case of "I develop Rust locally and hence have a pretty populated global cache". This isn't, however, catering to the case where you're trying out a dependency for the first time on a plane.
We'll eventually want a subcommand which downloads crates and probably their transitive dependencies, as well as explicitly updates the index. The design here is a little unclear, but if you have ideas please let us know!