Skip to content

Close commit-graph before calling 'gc' #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 0 additions & 17 deletions Documentation/technical/commit-graph.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,23 +127,6 @@ Design Details
helpful for these clones, anyway. The commit-graph will not be read or
written when shallow commits are present.

Future Work
-----------

- After computing and storing generation numbers, we must make graph
walks aware of generation numbers to gain the performance benefits they
enable. This will mostly be accomplished by swapping a commit-date-ordered
priority queue with one ordered by generation number. The following
operations are important candidates:

- 'log --topo-order'
- 'tag --merged'

- A server could provide a commit-graph file as part of the network protocol
to avoid extra calculations by clients. This feature is only of benefit if
the user is willing to trust the file, because verifying the file is correct
is as hard as computing it from scratch.

Related Links
-------------
[0] https://bugs.chromium.org/p/git/issues/detail?id=8
Expand Down
2 changes: 1 addition & 1 deletion builtin/am.c
Original file line number Diff line number Diff line change
Expand Up @@ -1800,7 +1800,7 @@ static void am_run(struct am_state *state, int resume)
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

*really* minor nit: the commit subject probably wants to have a "rename"
after the colon ;-)

The patch looks sensible to me. Since Junio asked for a sanity check
whether all of the call sites of `close_all_packs()` actually want to
close the MIDX and the commit graph, too, I'll do the "speak out loud"
type of patch review here (spoiler: all of them check out):

On Fri, 17 May 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/builtin/am.c b/builtin/am.c
> index 58a2aef28b..9315d32d2a 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1800,7 +1800,7 @@ static void am_run(struct am_state *state, int res=
ume)
>  	 */
>  	if (!state->rebasing) {
>  		am_destroy(state);
> -		close_all_packs(the_repository->objects);
> +		close_object_store(the_repository->objects);
>  		run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);

Here, we run `git gc --auto`, so we obviously really want to close all
read handles.

Check.

>  	}
>  }
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 50bde99618..82ce682c80 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1240,7 +1240,7 @@ int cmd_clone(int argc, const char **argv, const c=
har *prefix)
>  	transport_disconnect(transport);
>
>  	if (option_dissociate) {
> -		close_all_packs(the_repository->objects);
> +		close_object_store(the_repository->objects);
>  		dissociate_from_references();

Here, we prepare for disassociating the reference repository specified via
`git clone --reference <directory>`. Obviously, we need to let go of all
the handles we might have open there.

Check.

>  	}
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index b620fd54b4..3aec95608f 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1670,7 +1670,7 @@ int cmd_fetch(int argc, const char **argv, const c=
har *prefix)
>
>  	string_list_clear(&list, 0);
>
> -	close_all_packs(the_repository->objects);
> +	close_object_store(the_repository->objects);
>
>  	argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL);

Again, a `git gc --auto` that needs closing of all read handles to the
files that might be overwritten by the garbage collection.

Check.

>  	if (verbosity < 0)
> diff --git a/builtin/gc.c b/builtin/gc.c
> index df2573f124..20c8f1bfe8 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -632,7 +632,7 @@ int cmd_gc(int argc, const char **argv, const char *=
prefix)
>  	gc_before_repack();
>
>  	if (!repository_format_precious_objects) {
> -		close_all_packs(the_repository->objects);
> +		close_object_store(the_repository->objects);
>  		if (run_command_v_opt(repack.argv, RUN_GIT_CMD))

Here, we want to repack. AFAICT it is the only sane thing we can do to
invalidate whatever we read from the object store into memory.

Check.

>  			die(FAILED_RUN, repack.argv[0]);
>
> @@ -660,7 +660,7 @@ int cmd_gc(int argc, const char **argv, const char *=
prefix)
>  	report_garbage =3D report_pack_garbage;
>  	reprepare_packed_git(the_repository);
>  	if (pack_garbage.nr > 0) {
> -		close_all_packs(the_repository->objects);
> +		close_object_store(the_repository->objects);
>  		clean_pack_garbage();

This wants to delete a number of files that are now obsolete, and it makes
sense to make sure that there are no open read handles to those anymore.
It is a bit unclear from just reading the code what types of files are
accumulated into the `pack_garbage` string list, but then, we're in the
last throngs of a garbage collection, and *just* about to write a new
commit graph (if `gc.writeCommitGraph=3Dtrue`), so I think it is quite oka=
y
to close not only the packs here, but everything we opened from the object
store.

So I'd give this a check mark, too.

>  	}
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index e47d77baee..72d7a7c909 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -449,7 +449,7 @@ static void finish(struct commit *head_commit,
>  			 * We ignore errors in 'gc --auto', since the
>  			 * user should see them.
>  			 */
> -			close_all_packs(the_repository->objects);
> +			close_object_store(the_repository->objects);
>  			run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);

Obviously yet another `git gc --auto`, so yes, we need to close the object
store handles we have.

Check.

>  		}
>  	}
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 7c7bc13e91..ed30fcd633 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -328,7 +328,7 @@ static int finish_rebase(struct rebase_options *opts=
)
>
>  	delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
>  	apply_autostash(opts);
> -	close_all_packs(the_repository->objects);
> +	close_object_store(the_repository->objects);
>  	/*
>  	 * We ignore errors in 'gc --auto', since the
>  	 * user should see them.

Yet another `git gc --auto`.

Check.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index d58b7750b6..92cd1f508c 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2032,7 +2032,7 @@ int cmd_receive_pack(int argc, const char **argv, =
const char *prefix)
>  			proc.git_cmd =3D 1;
>  			proc.argv =3D argv_gc_auto;
>
> -			close_all_packs(the_repository->objects);
> +			close_object_store(the_repository->objects);
>  			if (!start_command(&proc)) {

This `proc` refers to another `git gc --auto` (see a couple lines above,
still within the hunk).

Check.

>  				if (use_sideband)
>  					copy_to_sideband(proc.err, -1, NULL);
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..4de8b6600c 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -419,7 +419,7 @@ int cmd_repack(int argc, const char **argv, const ch=
ar *prefix)
>  	if (!names.nr && !po_args.quiet)
>  		printf_ln(_("Nothing new to pack."));
>
> -	close_all_packs(the_repository->objects);
> +	close_object_store(the_repository->objects);
>
>  	/*
>  	 * Ok we have prepared all new packfiles.

Ah, the joys of un-dynamic patch review. What you, dear reader, cannot see
in this hunk is that the code comment at the end continues thusly:

         * First see if there are packs of the same name and if so
         * if we can move them out of the way (this can happen if we
         * repacked immediately after packing fully.
         */

Meaning: we're about to rename some pack files. So the pack file handles
need to be closed, all right, but what about the other object store
handles? There is no mention of the commit graph (more on that below), but
the loop following the code comment contains this:

                        if (!midx_cleared) {
                                clear_midx_file(the_repository);
                                midx_cleared =3D 1;
                        }

So yes, I would give this a check.

It does puzzle me, I have to admit, that there is no (opt-in) code block
to re-write the commit graph. After all, the commit graph references the
pack files, right? So if they are repacked, it would at least be
invalidated at this point...

> diff --git a/object.c b/object.c
> index e81d47a79c..cf1a2b7086 100644
> --- a/object.c
> +++ b/object.c
> @@ -517,7 +517,7 @@ void raw_object_store_clear(struct raw_object_store =
*o)
>  	o->loaded_alternates =3D 0;
>
>  	INIT_LIST_HEAD(&o->packed_git_mru);
> -	close_all_packs(o);
> +	close_object_store(o);

We're in the middle of a function called `raw_object_store_clear()`. So...

Check.

>  	o->packed_git =3D NULL;
>  }
>
> diff --git a/packfile.c b/packfile.c
> index ce12bffe3e..017046fcf9 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -337,7 +337,7 @@ void close_pack(struct packed_git *p)
>  	close_pack_index(p);
>  }
>
> -void close_all_packs(struct raw_object_store *o)
> +void close_object_store(struct raw_object_store *o)
>  {
>  	struct packed_git *p;
>
> diff --git a/packfile.h b/packfile.h
> index d70c6d9afb..e95e389eb8 100644
> --- a/packfile.h
> +++ b/packfile.h
> @@ -81,7 +81,7 @@ extern uint32_t get_pack_fanout(struct packed_git *p, =
uint32_t value);
>  extern unsigned char *use_pack(struct packed_git *, struct pack_window =
**, off_t, unsigned long *);
>  extern void close_pack_windows(struct packed_git *);
>  extern void close_pack(struct packed_git *);
> -extern void close_all_packs(struct raw_object_store *o);
> +extern void close_object_store(struct raw_object_store *o);
>  extern void unuse_pack(struct pack_window **);
>  extern void clear_delta_base_cache(void);
>  extern struct packed_git *add_packed_git(const char *path, size_t path_=
len, int local);
> --
> gitgitgadget

And this concludes my review.

Thank you!
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/20/2019 6:01 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> *really* minor nit: the commit subject probably wants to have a "rename"
> after the colon ;-)

I did put that there, but then the subject line was too long. I'm not
opposed to putting it back.
 
> The patch looks sensible to me. Since Junio asked for a sanity check
> whether all of the call sites of `close_all_packs()` actually want to
> close the MIDX and the commit graph, too, I'll do the "speak out loud"
> type of patch review here (spoiler: all of them check out):

Thanks for the detail here!

>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index 67f8978043..4de8b6600c 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -419,7 +419,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  	if (!names.nr && !po_args.quiet)
>>  		printf_ln(_("Nothing new to pack."));
>>
>> -	close_all_packs(the_repository->objects);
>> +	close_object_store(the_repository->objects);
>>
>>  	/*
>>  	 * Ok we have prepared all new packfiles.
> 
> Ah, the joys of un-dynamic patch review. What you, dear reader, cannot see
> in this hunk is that the code comment at the end continues thusly:
> 
>          * First see if there are packs of the same name and if so
>          * if we can move them out of the way (this can happen if we
>          * repacked immediately after packing fully.
>          */
> 
> Meaning: we're about to rename some pack files. So the pack file handles
> need to be closed, all right, but what about the other object store
> handles? There is no mention of the commit graph (more on that below), but
> the loop following the code comment contains this:
> 
>                         if (!midx_cleared) {
>                                 clear_midx_file(the_repository);
>                                 midx_cleared = 1;
>                         }
> 
> So yes, I would give this a check.
> 
> It does puzzle me, I have to admit, that there is no (opt-in) code block
> to re-write the commit graph. After all, the commit graph references the
> pack files, right? So if they are repacked, it would at least be
> invalidated at this point...

The commit-graph does not directly reference the packs. The file will still be
valid, except if we GC'd some commits that it references. We have the ability
to rewrite the graph in 'git gc'.

The MIDX does reference packs by name, so it needs to be cleared before we delete
packs. This _could_ be done with more care: we only need to delete it if a pack
it references is queued for deletion. However, you can do that using the
'git multi-pack-index expire|repack' pattern currently cooking.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> On 5/20/2019 6:01 AM, Johannes Schindelin wrote:
>> Hi Stolee,
>> 
>> *really* minor nit: the commit subject probably wants to have a "rename"
>> after the colon ;-)
>
> I did put that there, but then the subject line was too long. I'm not
> opposed to putting it back.

Let me locally amend what I queued in the meantime, then.  Thanks, both.

if (!state->rebasing) {
am_destroy(state);
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
}
}
Expand Down
2 changes: 1 addition & 1 deletion builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -1240,7 +1240,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
transport_disconnect(transport);

if (option_dissociate) {
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
dissociate_from_references();
}

Expand Down
21 changes: 11 additions & 10 deletions builtin/commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ static int graph_write(int argc, const char **argv)
struct string_list *pack_indexes = NULL;
struct string_list *commit_hex = NULL;
struct string_list lines;
int result;
int flags = COMMIT_GRAPH_PROGRESS;

static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
Expand All @@ -165,13 +167,13 @@ static int graph_write(int argc, const char **argv)
die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
if (opts.append)
flags |= COMMIT_GRAPH_APPEND;

read_replace_refs = 0;

if (opts.reachable) {
write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
return 0;
}
if (opts.reachable)
return write_commit_graph_reachable(opts.obj_dir, flags);

string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
Expand All @@ -188,14 +190,13 @@ static int graph_write(int argc, const char **argv)
UNLEAK(buf);
}

write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
opts.append,
1);
result = write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
flags);

UNLEAK(lines);
return 0;
return result;
}

int cmd_commit_graph(int argc, const char **argv, const char *prefix)
Expand Down
5 changes: 3 additions & 2 deletions builtin/commit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
"new_index file. Check that disk is not full and quota is\n"
"not exceeded, and then \"git reset HEAD\" to recover."));

if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
write_commit_graph_reachable(get_object_directory(), 0, 0);
if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
write_commit_graph_reachable(get_object_directory(), 0))
return 1;

repo_rerere(the_repository, 0);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
Expand Down
2 changes: 1 addition & 1 deletion builtin/fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -1670,7 +1670,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)

string_list_clear(&list, 0);

close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);

argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL);
if (verbosity < 0)
Expand Down
11 changes: 6 additions & 5 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
gc_before_repack();

if (!repository_format_precious_objects) {
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
if (run_command_v_opt(repack.argv, RUN_GIT_CMD))
die(FAILED_RUN, repack.argv[0]);

Expand Down Expand Up @@ -660,13 +660,14 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
report_garbage = report_pack_garbage;
reprepare_packed_git(the_repository);
if (pack_garbage.nr > 0) {
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
clean_pack_garbage();
}

if (gc_write_commit_graph)
write_commit_graph_reachable(get_object_directory(), 0,
!quiet && !daemonized);
if (gc_write_commit_graph &&
write_commit_graph_reachable(get_object_directory(),
!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
return 1;

if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
Expand Down
2 changes: 1 addition & 1 deletion builtin/merge.c
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ static void finish(struct commit *head_commit,
* We ignore errors in 'gc --auto', since the
* user should see them.
*/
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
}
}
Expand Down
2 changes: 1 addition & 1 deletion builtin/rebase.c
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ static int finish_rebase(struct rebase_options *opts)

delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
apply_autostash(opts);
close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
/*
* We ignore errors in 'gc --auto', since the
* user should see them.
Expand Down
2 changes: 1 addition & 1 deletion builtin/receive-pack.c
Original file line number Diff line number Diff line change
Expand Up @@ -2032,7 +2032,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
proc.git_cmd = 1;
proc.argv = argv_gc_auto;

close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);
if (!start_command(&proc)) {
if (use_sideband)
copy_to_sideband(proc.err, -1, NULL);
Expand Down
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (!names.nr && !po_args.quiet)
printf_ln(_("Nothing new to pack."));

close_all_packs(the_repository->objects);
close_object_store(the_repository->objects);

/*
* Ok we have prepared all new packfiles.
Expand Down
Loading