Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The 'git backfill' command already assumes the '--sparse' option when the repository uses the sparse-checkout feature. If the sparse-checkout patterns are in cone mode, then the path-walk API will restrict the set of trees it visits to only those necessary to reach the blobs that are matched in the sparse-checkout. In some cases, users will want a more restrictive set of blobs to download. Augment the 'git backfill' command to parse pathspecs from the user and filter the blobs that are downloaded to this set. While this implementation benefits from skipping the most expensive step of the process (downloading missing blobs), it still requires the path-walk API to track all tree and blob IDs and then the filter matches the pathspec only at the final filter. I attempted to filter the pathspec using the existing pattern_list mechanisms that power the --sparse option, as that would restrict the path-walk to only the objects that are required to reach the matching blob paths. However, my initial attempt used a match of every path at HEAD, leading to cubic behavior when given a recursive pathspec such as "t/*" in the Git repository; this becomes cubic when comparing N paths against M sparse-checkout patterns across T versions in history. This could be solved by more carefully constructing the pattern list to include recursive matches when the pathspec is recognized as working in that way. The problem is that we need to add patterns that lead the parent directories to match that recursive pattern. This becomes even more difficult when we recognize that some pathspecs don't follow a simple recursive match ("*.c", "t/*/*.sh"). For now, this simple implementation is more clearly correct. Later attempts to optimize this walk could be attempted, but should be built when the user need for that performance improvement is necessary. Note that using the --sparse option with a cone mode sparse-checkout is one way to reduce the size of the object walk and is compatible with pathspec matches, so selecting a restrictive sparse-checkout could help with any performance issues. Signed-off-by: Derrick Stolee <stolee@gmail.com>
- Loading branch information