Description
One possible feature we've talked about for a long time but never got around to implementing is the concept of pipelined rustc compilation. Currently today let's say that we have a crate A and a crate B that depends on A. Let's also say we're both compiling rlibs. When compiling this project Cargo will compile A first and then wait for it to finish completely before starting to compile B.
In reality, though, the compiler doesn't need the full compilation results of A to start B. Instead the compiler only needs metadata from A to start compiling B. Ideally Cargo would start compiling B as soon as A's metadata is ready to go.
This idea of pipelining rustc and starting rustc sooner doesn't reduce the overal work being done on each build, but it does in theory greatly increase the parallelism of the build as we can spawn rustc faster and keep all of a machine's cores warm doing Rust compilation. This is expected to have even bigger wins in release mode where post-metadata work in the compiler often takes quite some time. Furthermore incremental release builds should see huge wins because during incremental rebuilds of a chain of crates you can keep all cores busy instead of just dealing with one crate at a time.
There's three main parts of this implementation that need to happen:
- First, the compiler needs to be fixed to actually accept metadata as input for dependencies when producing an rlib. I believe currently it requires all inputs to be rlibs, but it shouldn't be too hard to fix this. (Tracking issue for rustc changes is Implement "pipelined" rustc compilation (compiler side) rust#58465)
- Second, the compiler will need to send a signal to Cargo as soon as metadata is produced. Cargo will instruct rustc to produce both metadata and an rlib, and once Cargo gets a signal metadata is ready then rustc will go on continuing to produce an rlib and Cargo will start subsequent rustc invocations that it can. Note that when I say "signal" here I don't really mean unix signals, but rather something like a message over a TCP socket or something like that.
- Finally Cargo will need to be refactored to listen for these signals, fork off more work in parallel, and properly synchronize the results of everything. This is where I suspect the bulk of the work will be happening (hence the issue in this repository).
In the ideal world the compiler would also wait just before linking for Cargo to let it know that all dependencies are ready. That's somewhat difficult, however, so I think it's probably best to start out incrementally and simply say that Cargo doesn't start a compilation that requires linking until all dependencies are finished (as it does today).
I've talked with @ehuss about possibly implementing this as well as the compiler team about the idea, but I think this is a small enough chunk of work (although certainly not trivial) to be done in the near future!