Skip to content

Commit f8ee11d

Browse files
committed
pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee <stolee@gmail.com>
1 parent cd360ad commit f8ee11d

File tree

4 files changed

+169
-11
lines changed

4 files changed

+169
-11
lines changed

Documentation/git-pack-objects.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ SYNOPSIS
1515
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
1616
[--cruft] [--cruft-expiration=<time>]
1717
[--stdout [--filter=<filter-spec>] | <base-name>]
18-
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
18+
[--shallow] [--keep-true-parents] [--[no-]sparse]
19+
[--path-walk] < <object-list>
1920

2021

2122
DESCRIPTION
@@ -345,6 +346,16 @@ raise an error.
345346
Restrict delta matches based on "islands". See DELTA ISLANDS
346347
below.
347348

349+
--path-walk::
350+
By default, `git pack-objects` walks objects in an order that
351+
presents trees and blobs in an order unrelated to the path they
352+
appear relative to a commit's root tree. The `--path-walk` option
353+
enables a different walking algorithm that organizes trees and
354+
blobs by path. This has the potential to improve delta compression
355+
especially in the presence of filenames that cause collisions in
356+
Git's default name-hash algorithm. Due to changing how the objects
357+
are walked, this option is not compatible with `--delta-islands`,
358+
`--shallow`, or `--filter`.
348359

349360
DELTA ISLANDS
350361
-------------

Documentation/technical/api-path-walk.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,5 @@ Examples
6969
--------
7070

7171
See example usages in:
72-
`t/helper/test-path-walk.c`
72+
`t/helper/test-path-walk.c`,
73+
`builtin/pack-objects.c`

builtin/pack-objects.c

Lines changed: 138 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@
3939
#include "promisor-remote.h"
4040
#include "pack-mtimes.h"
4141
#include "parse-options.h"
42+
#include "blob.h"
43+
#include "tree.h"
44+
#include "path-walk.h"
4245

4346
/*
4447
* Objects we are going to pack are collected in the `to_pack` structure.
@@ -215,6 +218,7 @@ static int delta_search_threads;
215218
static int pack_to_stdout;
216219
static int sparse;
217220
static int thin;
221+
static int path_walk;
218222
static int num_preferred_base;
219223
static struct progress *progress_state;
220224

@@ -4143,6 +4147,105 @@ static void mark_bitmap_preferred_tips(void)
41434147
}
41444148
}
41454149

4150+
static inline int is_oid_interesting(struct repository *repo,
4151+
struct object_id *oid)
4152+
{
4153+
struct object *o = lookup_object(repo, oid);
4154+
return o && !(o->flags & UNINTERESTING);
4155+
}
4156+
4157+
static int add_objects_by_path(const char *path,
4158+
struct oid_array *oids,
4159+
enum object_type type,
4160+
void *data)
4161+
{
4162+
struct object_entry **delta_list;
4163+
size_t oe_start = to_pack.nr_objects;
4164+
size_t oe_end;
4165+
unsigned int sub_list_size;
4166+
unsigned int *processed = data;
4167+
4168+
/*
4169+
* First, add all objects to the packing data, including the ones
4170+
* marked UNINTERESTING (translated to 'exclude') as they can be
4171+
* used as delta bases.
4172+
*/
4173+
for (size_t i = 0; i < oids->nr; i++) {
4174+
int exclude;
4175+
struct object_info oi = OBJECT_INFO_INIT;
4176+
struct object_id *oid = &oids->oid[i];
4177+
4178+
/* Skip objects that do not exist locally. */
4179+
if (exclude_promisor_objects &&
4180+
oid_object_info_extended(the_repository, oid, &oi,
4181+
OBJECT_INFO_FOR_PREFETCH) < 0)
4182+
continue;
4183+
4184+
exclude = !is_oid_interesting(the_repository, oid);
4185+
4186+
if (exclude && !thin)
4187+
continue;
4188+
4189+
add_object_entry(oid, type, path, exclude);
4190+
}
4191+
4192+
oe_end = to_pack.nr_objects;
4193+
4194+
/* We can skip delta calculations if it is a no-op. */
4195+
if (oe_end == oe_start || !window)
4196+
return 0;
4197+
4198+
sub_list_size = 0;
4199+
ALLOC_ARRAY(delta_list, oe_end - oe_start);
4200+
4201+
for (size_t i = 0; i < oe_end - oe_start; i++) {
4202+
struct object_entry *entry = to_pack.objects + oe_start + i;
4203+
4204+
if (!should_attempt_deltas(entry))
4205+
continue;
4206+
4207+
delta_list[sub_list_size++] = entry;
4208+
}
4209+
4210+
/*
4211+
* Find delta bases among this list of objects that all match the same
4212+
* path. This causes the delta compression to be interleaved in the
4213+
* object walk, which can lead to confusing progress indicators. This is
4214+
* also incompatible with threaded delta calculations. In the future,
4215+
* consider creating a list of regions in the full to_pack.objects array
4216+
* that could be picked up by the threaded delta computation.
4217+
*/
4218+
if (sub_list_size && window) {
4219+
QSORT(delta_list, sub_list_size, type_size_sort);
4220+
find_deltas(delta_list, &sub_list_size, window, depth, processed);
4221+
}
4222+
4223+
free(delta_list);
4224+
return 0;
4225+
}
4226+
4227+
static void get_object_list_path_walk(struct rev_info *revs)
4228+
{
4229+
struct path_walk_info info = PATH_WALK_INFO_INIT;
4230+
unsigned int processed = 0;
4231+
4232+
info.revs = revs;
4233+
info.path_fn = add_objects_by_path;
4234+
info.path_fn_data = &processed;
4235+
revs->tag_objects = 1;
4236+
4237+
/*
4238+
* Allow the --[no-]sparse option to be interesting here, if only
4239+
* for testing purposes. Paths with no interesting objects will not
4240+
* contribute to the resulting pack, but only create noisy preferred
4241+
* base objects.
4242+
*/
4243+
info.prune_all_uninteresting = sparse;
4244+
4245+
if (walk_objects_by_path(&info))
4246+
die(_("failed to pack objects via path-walk"));
4247+
}
4248+
41464249
static void get_object_list(struct rev_info *revs, int ac, const char **av)
41474250
{
41484251
struct setup_revision_opt s_r_opt = {
@@ -4189,7 +4292,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
41894292

41904293
warn_on_object_refname_ambiguity = save_warning;
41914294

4192-
if (use_bitmap_index && !get_object_list_from_bitmap(revs))
4295+
if (use_bitmap_index && !path_walk && !get_object_list_from_bitmap(revs))
41934296
return;
41944297

41954298
if (use_delta_islands)
@@ -4198,15 +4301,19 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
41984301
if (write_bitmap_index)
41994302
mark_bitmap_preferred_tips();
42004303

4201-
if (prepare_revision_walk(revs))
4202-
die(_("revision walk setup failed"));
4203-
mark_edges_uninteresting(revs, show_edge, sparse);
4204-
42054304
if (!fn_show_object)
42064305
fn_show_object = show_object;
4207-
traverse_commit_list(revs,
4208-
show_commit, fn_show_object,
4209-
NULL);
4306+
4307+
if (path_walk) {
4308+
get_object_list_path_walk(revs);
4309+
} else {
4310+
if (prepare_revision_walk(revs))
4311+
die(_("revision walk setup failed"));
4312+
mark_edges_uninteresting(revs, show_edge, sparse);
4313+
traverse_commit_list(revs,
4314+
show_commit, fn_show_object,
4315+
NULL);
4316+
}
42104317

42114318
if (unpack_unreachable_expiration) {
42124319
revs->ignore_missing_links = 1;
@@ -4404,6 +4511,8 @@ int cmd_pack_objects(int argc,
44044511
N_("use the sparse reachability algorithm")),
44054512
OPT_BOOL(0, "thin", &thin,
44064513
N_("create thin packs")),
4514+
OPT_BOOL(0, "path-walk", &path_walk,
4515+
N_("use the path-walk API to walk objects when possible")),
44074516
OPT_BOOL(0, "shallow", &shallow,
44084517
N_("create packs suitable for shallow fetches")),
44094518
OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep_on_disk,
@@ -4484,7 +4593,27 @@ int cmd_pack_objects(int argc,
44844593
window = 0;
44854594

44864595
strvec_push(&rp, "pack-objects");
4487-
if (thin) {
4596+
4597+
if (path_walk && filter_options.choice) {
4598+
warning(_("cannot use --filter with --path-walk"));
4599+
path_walk = 0;
4600+
}
4601+
if (path_walk && use_delta_islands) {
4602+
warning(_("cannot use delta islands with --path-walk"));
4603+
path_walk = 0;
4604+
}
4605+
if (path_walk && shallow) {
4606+
warning(_("cannot use --shallow with --path-walk"));
4607+
path_walk = 0;
4608+
}
4609+
if (path_walk) {
4610+
strvec_push(&rp, "--boundary");
4611+
/*
4612+
* We must disable the bitmaps because we are removing
4613+
* the --objects / --objects-edge[-aggressive] options.
4614+
*/
4615+
use_bitmap_index = 0;
4616+
} else if (thin) {
44884617
use_internal_rev_list = 1;
44894618
strvec_push(&rp, shallow
44904619
? "--objects-edge-aggressive"

t/t5300-pack-object.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -674,4 +674,21 @@ do
674674
'
675675
done
676676

677+
# Basic "repack everything" test
678+
test_expect_success '--path-walk pack everything' '
679+
git -C server rev-parse HEAD >in &&
680+
git -C server pack-objects --stdout --revs --path-walk <in >out.pack &&
681+
git -C server index-pack --stdin <out.pack
682+
'
683+
684+
# Basic "thin pack" test
685+
test_expect_success '--path-walk thin pack' '
686+
cat >in <<-EOF &&
687+
$(git -C server rev-parse HEAD)
688+
^$(git -C server rev-parse HEAD~2)
689+
EOF
690+
git -C server pack-objects --thin --stdout --revs --path-walk <in >out.pack &&
691+
git -C server index-pack --fix-thin --stdin <out.pack
692+
'
693+
677694
test_done

0 commit comments

Comments
 (0)