Skip to content

provide a way to not download tests/, examples and other such common directories #13491

Open
@poliorcetics

Description

@poliorcetics

Problem

By default cargo includes everything from the root of the package (or almost, see https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields)

This is very nice for archiving purposes by crates.io, it allows example scraping from docs.rs and probably other purposes I haven't encountered, but it has an issue: it makes lots of CI downloads heavier than they need to be.

Looking through the dependencies we use at $work I see around a 100MiB of unused files just counting tests/, but there are also .github, benches/, examples/ and probably more I missed, and we have only ~600 deps out of the 138k crates on crates.io (without even counting all the available versions).

We have a cache to avoid spamming crates.io whenever possible so it's not like we request an extra 100MiB on each CI but still, I don't like putting pressure on it if we can avoid it.

As linked above, https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields is intended to help with that but it's not automatic: people need to consider and maintain it for each crate (and from the start, else older versions will still have all the unused files)

Proposed Solution

We may need a way to say to cargo only fetch the absolute necessary to build the crate. Two issues with that:

  • What is the "absolute necessary": probably at least src/, Cargo.toml, license files, build.rs ? Some -sys crates may need more
  • Changing that silently in a backward compatible way is probably hard ? I think it would break the hashes so it could not become the default automatically at the very least.

It could take the form of cargo fetch --no-extras (to be bikeshedded) ?

Notes

Note: I'm excluding binary crates here, but they could easily be included too, if only so cargo install downloads less if possible, but I don't think binary crates are downloaded as much as library ones.

I also don't know if the benefits would be worth it, since it could mean one of:

  • Recomputing the set of files on each download
  • Storing two archives of the same crate&version

I have no data saying the bandwidth gains would be enough to offset either the compute or space gains, if anyone knows (the infra team ?) I would be happy to be proven wrong!

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`Command-packageCommand-publishS-needs-designStatus: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions