Description
Problem
By default cargo
includes everything from the root of the package (or almost, see https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields)
This is very nice for archiving purposes by crates.io, it allows example scraping from docs.rs and probably other purposes I haven't encountered, but it has an issue: it makes lots of CI downloads heavier than they need to be.
Looking through the dependencies we use at $work I see around a 100MiB of unused files just counting tests/
, but there are also .github
, benches/
, examples/
and probably more I missed, and we have only ~600 deps out of the 138k crates on crates.io (without even counting all the available versions).
We have a cache to avoid spamming crates.io whenever possible so it's not like we request an extra 100MiB on each CI but still, I don't like putting pressure on it if we can avoid it.
As linked above, https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields is intended to help with that but it's not automatic: people need to consider and maintain it for each crate (and from the start, else older versions will still have all the unused files)
Proposed Solution
We may need a way to say to cargo only fetch the absolute necessary to build the crate
. Two issues with that:
- What is the "absolute necessary": probably at least
src/, Cargo.toml, license files, build.rs
? Some-sys
crates may need more - Changing that silently in a backward compatible way is probably hard ? I think it would break the hashes so it could not become the default automatically at the very least.
It could take the form of cargo fetch --no-extras
(to be bikeshedded) ?
Notes
Note: I'm excluding binary crates here, but they could easily be included too, if only so cargo install
downloads less if possible, but I don't think binary crates are downloaded as much as library ones.
I also don't know if the benefits would be worth it, since it could mean one of:
- Recomputing the set of files on each download
- Storing two archives of the same crate&version
I have no data saying the bandwidth gains would be enough to offset either the compute or space gains, if anyone knows (the infra team ?) I would be happy to be proven wrong!