Skip to content

git submodules are not cached #7987

Open
@ehuss

Description

@ehuss

Problem
If a package has a git dependency with a large submodule, any change to the git repo that updates the submodule causes the entire submodule repo to be re-downloaded from scratch, and an entire separate copy is retained. This can be very expensive for both network download time and disk space.

Steps

  1. In a blank project add dependency: rocksdb = {git = "https://github.com/tikv/rust-rocksdb.git", rev="fe7be35ba191684c989effdc6ee8e39a3978e650"}
  2. cargo fetch
  3. Change rev to 3cd18c44d160a3cdba586d6502d51b7cc67efc59
  4. cargo fetch
  5. Notice it downloaded the entirety of the submodule https://github.com/tikv/rocksdb.git which is about 100MB.
  6. Change rev to 5adf5b847e13cea2a59a1b4921aa5bf38591d1a3
  7. cargo fetch
  8. Notice it downloaded yet another copy.

Possible Solution(s)
The repo in git/db/… should probably contain the submodule. Currently it appears that it checks out a fresh copy for every commit in git/checkout/…. I think it is because cargo is using Submodule::open here. I wonder if using Submodule::update would be the solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cachingArea: caching of dependencies, repositories, and build artifactsA-gitArea: anything dealing with gitA-networkingArea: networking issues, curl, etc.C-bugCategory: bugS-needs-mentorStatus: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions