Skip to content

Is there any way to sync fewer merge commits when extracting a subfolder from a large repo? #1328

Open
@RalfJung

Description

@RalfJung

I've asked this before in #952 and was told "no", but maybe it's worth asking again... as other projects that have a presence in the Rust compiler monorepo are considering josh, we're seeing initial josh syncs that add more than 10k commits from the parent repo, meaning that about 1/3 of the commits in that history are not actually from the subproject. In Miri we have accumulated at least around 3500 of these commits (it's hard to reliably find them all so this is a lower bound), which is more than a quarter of the commits in Miri. rust-analyzer seems to be doing better, "only" getting around 1500 commits in the initial sync, which is around 5% of the total commits in that history.

When adding an external repo as a subdirectory into a monorepo (what you want to do) Josh guarantees that splitting that subdirectory back out will yield the exact same sha1's like the original repo. (Most, if not all, other filtering tools do not make that guarantee)

These commits do not originate from the subrepo, so we don't need them to be preserved. But of course josh has no way of knowing that... maybe if we could tell it which part of the history is originally from the subrepo ("this is the subrepo HEAD, everything above this, if it exists in the parent repo, must be extracted perfectly"), it could be more "sloppy" on the remaining history? But I can see how that could be anything between tricky and complete nonsense...

Cc @flip1995 @lnicola

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions