Skip to content

Maintenance I: Command, gc and commit-graph tasks #695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
/git-ls-tree
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Thu, 6 Aug 2020 at 19:57, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++

This "+" and the one below render literally so you would want to drop
them. (You're not in any kind of "list" here, so no need for a "list
continuation".)

> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.

Martin

/git-mailinfo
/git-mailsplit
/git-maintenance
/git-merge
/git-merge-base
/git-merge-index
Expand Down
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,8 @@ include::config/mailinfo.txt[]

include::config/mailmap.txt[]

include::config/maintenance.txt[]

include::config/man.txt[]

include::config/merge.txt[]
Expand Down
16 changes: 16 additions & 0 deletions Documentation/config/maintenance.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
maintenance.<task>.enabled::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> Instead of writing a new commit-graph in every 'git maintenance run
> --auto' process (when maintenance.commit-graph.enalbed is configured to

s/enalbed/enabled/

> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1		(1u<<16)

Why this name? "SEEN" seems perfectly fine.

> +static int num_commits_not_in_graph = 0;
> +static int limit_commits_not_in_graph = 100;
> +
> +static int dfs_on_ref(const char *refname,
> +		      const struct object_id *oid, int flags,
> +		      void *cb_data)
> +{

[snip]

> +static int should_write_commit_graph(void)
> +{
> +	int result;
> +
> +	git_config_get_int("maintenance.commit-graph.auto",
> +			   &limit_commits_not_in_graph);
> +
> +	if (!limit_commits_not_in_graph)
> +		return 0;
> +	if (limit_commits_not_in_graph < 0)
> +		return 1;
> +
> +	result = for_each_ref(dfs_on_ref, NULL);

I don't like introducing the mutable global num_commits_not_in_graph
especially when there seems to be at least one easy alternative (e.g.
putting it in cb_data) but I know that this is a pattern than existing
code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/18/2020 8:09 PM, Jonathan Tan wrote:
>> Instead of writing a new commit-graph in every 'git maintenance run
>> --auto' process (when maintenance.commit-graph.enalbed is configured to
> 
> s/enalbed/enabled/
> 
>> +/* Remember to update object flag allocation in object.h */
>> +#define PARENT1		(1u<<16)
> 
> Why this name? "SEEN" seems perfectly fine.

For some reason I thought that there might be issues using
SEEN (1u<<0) but trying it locally does not seem to be a
problem. I think I've just been bit by how revision.c uses
it a bit too often. The use here is independent enough to
not cause problems.

>> +static int num_commits_not_in_graph = 0;
>> +static int limit_commits_not_in_graph = 100;
>> +
>> +static int dfs_on_ref(const char *refname,
>> +		      const struct object_id *oid, int flags,
>> +		      void *cb_data)
>> +{
> 
> [snip]
> 
>> +static int should_write_commit_graph(void)
>> +{
>> +	int result;
>> +
>> +	git_config_get_int("maintenance.commit-graph.auto",
>> +			   &limit_commits_not_in_graph);
>> +
>> +	if (!limit_commits_not_in_graph)
>> +		return 0;
>> +	if (limit_commits_not_in_graph < 0)
>> +		return 1;
>> +
>> +	result = for_each_ref(dfs_on_ref, NULL);
> 
> I don't like introducing the mutable global num_commits_not_in_graph
> especially when there seems to be at least one easy alternative (e.g.
> putting it in cb_data) but I know that this is a pattern than existing
> code.

I should have done this right in the first place. Thanks for
catching it.

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Instead of writing a new commit-graph in every 'git maintenance run
> --auto' process (when maintenance.commit-graph.enalbed is configured to
> be true), only write when there are "enough" commits not in a
> commit-graph file.
> 
> This count is controlled by the maintenance.commit-graph.auto config
> option.
> 
> To compute the count, use a depth-first search starting at each ref, and
> leaving markers using the PARENT1 flag. If this count reaches the limit,

PARENT1 -> SEEN

Other than that, all the 11 patches look good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jonathan Tan <jonathantanmy@google.com> writes:

>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> Instead of writing a new commit-graph in every 'git maintenance run
>> --auto' process (when maintenance.commit-graph.enalbed is configured to
>> be true), only write when there are "enough" commits not in a
>> commit-graph file.
>> 
>> This count is controlled by the maintenance.commit-graph.auto config
>> option.
>> 
>> To compute the count, use a depth-first search starting at each ref, and
>> leaving markers using the PARENT1 flag. If this count reaches the limit,
>
> PARENT1 -> SEEN
>
> Other than that, all the 11 patches look good.

Thanks.

This boolean config option controls whether the maintenance task
with name `<task>` is run when no `--task` option is specified to
`git maintenance run`. These config values are ignored if a
`--task` option exists. By default, only `maintenance.gc.enabled`
is true.

maintenance.commit-graph.auto::
This integer config option controls how often the `commit-graph` task
should be run as part of `git maintenance run --auto`. If zero, then
the `commit-graph` task will not run with the `--auto` option. A
negative value will force the task to run every time. Otherwise, a
positive value implies the command should run when the number of
reachable commits that are not in the commit-graph file is at least
the value of `maintenance.commit-graph.auto`. The default value is
100.
6 changes: 4 additions & 2 deletions Documentation/fetch-options.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,11 @@ ifndef::git-pull[]
Allow several <repository> and <group> arguments to be
specified. No <refspec>s may be specified.

--[no-]auto-maintenance::
--[no-]auto-gc::
Run `git gc --auto` at the end to perform garbage collection
if needed. This is enabled by default.
Run `git maintenance run --auto` at the end to perform automatic
repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
This is enabled by default.

--[no-]write-commit-graph::
Write a commit-graph after fetching. This overrides the config
Expand Down
6 changes: 3 additions & 3 deletions Documentation/git-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
other Git command that makes any existing commit unreferenced) in the
source repository, some objects may become unreferenced (or dangling).
These objects may be removed by normal Git operations (such as `git commit`)
which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
If these objects are removed and were referenced by the cloned repository,
then the cloned repository will become corrupt.
which automatically call `git maintenance run --auto`. (See
linkgit:git-maintenance[1].) If these objects are removed and were referenced
by the cloned repository, then the cloned repository will become corrupt.
+
Note that running `git repack` without the `--local` option in a repository
cloned with `--shared` will copy objects from the source repository into a pack
Expand Down
79 changes: 79 additions & 0 deletions Documentation/git-maintenance.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
git-maintenance(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 089fa4cedc..35b0be7d40 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -35,6 +35,24 @@ run::
>  TASKS
>  -----
>
> +commit-graph::
> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> +       then verifies that the written data is correct. If the new layer has an
> +       issue, then the chain file is removed and the `commit-graph` is
> +       rewritten from scratch.
> ++
> +The verification only checks the top layer of the `commit-graph` chain.
> +If the incremental write merged the new commits with at least one
> +existing layer, then there is potential for on-disk corruption being
> +carried forward into the new file. This will be noticed and the new
> +commit-graph file will be clean as Git reparses the commit data from
> +the object database.

This reads somewhat awkwardly. I think what you mean is something like
"is there a risk for on-disk corruption? yes, but no: we're clever
enough to detect it and avoid it". So from a user's point of view, I
think this is too detailed.

How about

 The verification checks the top layer of the resulting `commit-graph` chain.
 This ensures that the maintenance task leaves the top layer in a shape
 that matches the object database, even if it was ostensibly constructed
 by "merging in" existing, incorrect layers.

? I don't quite like it -- it's a bit too technical -- but it describes
the end result which the user should care about -- on-disk data is
consistent and correct -- rather than how we got there.

Perhaps:

 It is ensured that the resulting "top layer" is correct. This should
 help avoid most on-disk corruption of the commit-graph and ensure
 that the commit-graph matches the object database.

Don't know whether that's entirely true though...

Food for thought, perhaps.

There's something probabilistic about this whole thing: If a low layer
is corrupt, you might "eventually" get to replace it. I suppose it could
make sense to go "verify the whole thing, drop however many top layers
we need to drop to get only correct layers, then generate a new layer
(and merge and whatnot) on top of that". But the proposed commit message
makes it fairly clear that this would have other drawbacks and that we
don't really expect corrupt layers in the first place.

Martin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/7/2020 6:29 PM, Martin Ågren wrote:
> On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
>> index 089fa4cedc..35b0be7d40 100644
>> --- a/Documentation/git-maintenance.txt
>> +++ b/Documentation/git-maintenance.txt
>> @@ -35,6 +35,24 @@ run::
>>  TASKS
>>  -----
>>
>> +commit-graph::
>> +       The `commit-graph` job updates the `commit-graph` files incrementally,
>> +       then verifies that the written data is correct. If the new layer has an
>> +       issue, then the chain file is removed and the `commit-graph` is
>> +       rewritten from scratch.
>> ++
>> +The verification only checks the top layer of the `commit-graph` chain.
>> +If the incremental write merged the new commits with at least one
>> +existing layer, then there is potential for on-disk corruption being
>> +carried forward into the new file. This will be noticed and the new
>> +commit-graph file will be clean as Git reparses the commit data from
>> +the object database.
> 
> This reads somewhat awkwardly. I think what you mean is something like
> "is there a risk for on-disk corruption? yes, but no: we're clever
> enough to detect it and avoid it". So from a user's point of view, I
> think this is too detailed.
> 
> How about
> 
>  The verification checks the top layer of the resulting `commit-graph` chain.
>  This ensures that the maintenance task leaves the top layer in a shape
>  that matches the object database, even if it was ostensibly constructed
>  by "merging in" existing, incorrect layers.
> 
> ? I don't quite like it -- it's a bit too technical -- but it describes
> the end result which the user should care about -- on-disk data is
> consistent and correct -- rather than how we got there.
> 
> Perhaps:
> 
>  It is ensured that the resulting "top layer" is correct. This should
>  help avoid most on-disk corruption of the commit-graph and ensure
>  that the commit-graph matches the object database.
> 
> Don't know whether that's entirely true though...

This is my understanding. We focus on the data we just wrote to see
if anything went wrong at a lower level (i.e. filesystem or RAM
corruption during the write process) instead of the data "at rest".

> Food for thought, perhaps.
> 
> There's something probabilistic about this whole thing: If a low layer
> is corrupt, you might "eventually" get to replace it. I suppose it could
> make sense to go "verify the whole thing, drop however many top layers
> we need to drop to get only correct layers, then generate a new layer
> (and merge and whatnot) on top of that". But the proposed commit message
> makes it fairly clear that this would have other drawbacks and that we
> don't really expect corrupt layers in the first place.

Right. And we want to keep the amount of work to be very small in most
cases, amortizing the cost of the big merge operations across many much
smaller runs. Perhaps a later change could introduce an option to drop
the '--shallow' option and check the entire chain, for users willing to
pay that price.

Back to the point of your comments: I'm not sure this second paragraph
is required at all in the documentation. The first paragraph already
says:

	...then verifies that the written data is correct.

This "written data" _is_ the top layer of the chain. There is probably
no reason to dig deeper into _why_ we do this in this user-facing
documentation.

So, I propose just deleting this paragraph. What do you think?

Thanks for keeping a close eye on documentation changes! I updated
my local branch to include the '+' problem from your other reply.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Wed, 12 Aug 2020 at 15:30, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/7/2020 6:29 PM, Martin Ågren wrote:
> > On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> >> index 089fa4cedc..35b0be7d40 100644
> >> --- a/Documentation/git-maintenance.txt
> >> +++ b/Documentation/git-maintenance.txt
> >> @@ -35,6 +35,24 @@ run::
> >>  TASKS
> >>  -----
> >>
> >> +commit-graph::
> >> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> >> +       then verifies that the written data is correct. If the new layer has an
> >> +       issue, then the chain file is removed and the `commit-graph` is
> >> +       rewritten from scratch.
> >> ++
> >> +The verification only checks the top layer of the `commit-graph` chain.
> >> +If the incremental write merged the new commits with at least one
> >> +existing layer, then there is potential for on-disk corruption being
> >> +carried forward into the new file. This will be noticed and the new
> >> +commit-graph file will be clean as Git reparses the commit data from
> >> +the object database.
> >
> > This reads somewhat awkwardly. I think what you mean is something like
> > "is there a risk for on-disk corruption? yes, but no: we're clever
> > enough to detect it and avoid it". So from a user's point of view, I
> > think this is too detailed.

[snip quite a bit]

> Back to the point of your comments: I'm not sure this second paragraph
> is required at all in the documentation. The first paragraph already
> says:
>
>         ...then verifies that the written data is correct.
>
> This "written data" _is_ the top layer of the chain. There is probably
> no reason to dig deeper into _why_ we do this in this user-facing
> documentation.
>
> So, I propose just deleting this paragraph. What do you think?

Yeah, that makes lots of sense. Thanks!


Martin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> By using 'git commit-graph verify --shallow' we can ensure that
> the file we just wrote is valid. This is an extra safety precaution
> that is faster than our 'write' subcommand. In the rare situation
> that the newest layer of the commit-graph is corrupt, we can "fix"
> the corruption by deleting the commit-graph-chain file and rewrite
> the full commit-graph as a new one-layer commit graph. This does
> not completely prevent _that_ file from being corrupt, but it does
> recompute the commit-graph by parsing commits from the object
> database. In our use of this step in Scalar and VFS for Git, we
> have only seen this issue arise because our microsoft/git fork
> reverted 43d3561 ("commit-graph write: don't die if the existing
> graph is corrupt" 2019-03-25) for a while to keep commit-graph
> writes very fast. We dropped the revert when updating to v2.23.0.
> The verify still has potential for catching corrupt data across
> the layer boundary: if the new file has commit X with parent Y
> in an old file but the commit ID for Y in the old file had a
> bitswap, then we will notice that in the 'verify' command.

I'm concerned about having this extra precaution, because it is never
tested (neither here or in a future patch). When you saw this issue
arise, was there ever an instance in which a valid set of commit graph
files turned invalid after this maintenance step? (It seems from your
description that the set was invalid to begin with, so the maintenance
step did not fix it but also did not make it worse. And it does not make
it worse, that seems not too bad to me.)

Other than that, this patch looks good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/18/2020 7:51 PM, Jonathan Tan wrote:
>> By using 'git commit-graph verify --shallow' we can ensure that
>> the file we just wrote is valid. This is an extra safety precaution
>> that is faster than our 'write' subcommand. In the rare situation
>> that the newest layer of the commit-graph is corrupt, we can "fix"
>> the corruption by deleting the commit-graph-chain file and rewrite
>> the full commit-graph as a new one-layer commit graph. This does
>> not completely prevent _that_ file from being corrupt, but it does
>> recompute the commit-graph by parsing commits from the object
>> database. In our use of this step in Scalar and VFS for Git, we
>> have only seen this issue arise because our microsoft/git fork
>> reverted 43d3561 ("commit-graph write: don't die if the existing
>> graph is corrupt" 2019-03-25) for a while to keep commit-graph
>> writes very fast. We dropped the revert when updating to v2.23.0.
>> The verify still has potential for catching corrupt data across
>> the layer boundary: if the new file has commit X with parent Y
>> in an old file but the commit ID for Y in the old file had a
>> bitswap, then we will notice that in the 'verify' command.
> 
> I'm concerned about having this extra precaution, because it is never
> tested (neither here or in a future patch). When you saw this issue
> arise, was there ever an instance in which a valid set of commit graph
> files turned invalid after this maintenance step? (It seems from your
> description that the set was invalid to begin with, so the maintenance
> step did not fix it but also did not make it worse. And it does not make
> it worse, that seems not too bad to me.)

The cases we've seen this happen were root-caused to hardware
problems (disk or RAM), and the invalid data was present immediately
after the "git commit-graph write" command. Since implementing the
"verify" step after each write, we have not had any user reports of
problems with these files.

Are you suggesting one of these options?

 1. Remove this validation and rewrite entirely.

 2. Remove the rewrite and only delete the known-bad data.

 3. Insert a way to cause the verify to return failure mid-process,
    so that this function can be covered by tests.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> The cases we've seen this happen were root-caused to hardware
> problems (disk or RAM), and the invalid data was present immediately
> after the "git commit-graph write" command. Since implementing the
> "verify" step after each write, we have not had any user reports of
> problems with these files.
> 
> Are you suggesting one of these options?
> 
>  1. Remove this validation and rewrite entirely.
> 
>  2. Remove the rewrite and only delete the known-bad data.
> 
>  3. Insert a way to cause the verify to return failure mid-process,
>     so that this function can be covered by tests.

I was suggesting 1, although 2 and 3 would assuage my concerns also (2
less so, but just deleting the file is relatively simple and easier to
reason about). I don't think Git in general defends much against
hardware problems, but if an issue is noticeable in some hardware (as is
in this case, at least in the past) I can see why we would want to
defend against it. My concern is that if we do defend against it, but
leave it untested, several versions of Git later we might find that we
need this defense but it does not work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> @@ -66,6 +68,10 @@ OPTIONS
>  --quiet::
>  	Do not report progress or other information over `stderr`.
>  
> +--task=<task>::
> +	If this option is specified one or more times, then only run the
> +	specified tasks in the specified order.

We should list the accepted tasks somewhere but maybe this can wait
until after part 2.

> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>  struct maintenance_task {
>  	const char *name;
>  	maintenance_task_fn *fn;
> -	unsigned enabled:1;
> +	unsigned enabled:1,
> +		 selected:1;
> +	int selected_order;
>  };

"selected" and "selected_order" are redundant in some cases - I think
this would be better if selected_order is negative if this task is not
selected, and non-negative otherwise.

Apart from that, maybe this should be documented. It is unusual (to me)
that a selection can override something being enabled, but that is the
case here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jonathan Tan <jonathantanmy@google.com> writes:

>> @@ -66,6 +68,10 @@ OPTIONS
>>  --quiet::
>>  	Do not report progress or other information over `stderr`.
>>  
>> +--task=<task>::
>> +	If this option is specified one or more times, then only run the
>> +	specified tasks in the specified order.
>
> We should list the accepted tasks somewhere but maybe this can wait
> until after part 2.
>
>> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>>  struct maintenance_task {
>>  	const char *name;
>>  	maintenance_task_fn *fn;
>> -	unsigned enabled:1;
>> +	unsigned enabled:1,
>> +		 selected:1;
>> +	int selected_order;
>>  };
>
> "selected" and "selected_order" are redundant in some cases - I think
> this would be better if selected_order is negative if this task is not
> selected, and non-negative otherwise.

It is good to get rid of redundancies.

> Apart from that, maybe this should be documented. It is unusual (to me)
> that a selection can override something being enabled, but that is the
> case here.

I had the same reaction as "(to me)" above during an earlier review
round, IIRC.  This definitely deserves documentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/18/2020 8:36 PM, Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
>>> @@ -66,6 +68,10 @@ OPTIONS
>>>  --quiet::
>>>  	Do not report progress or other information over `stderr`.
>>>  
>>> +--task=<task>::
>>> +	If this option is specified one or more times, then only run the
>>> +	specified tasks in the specified order.
>>
>> We should list the accepted tasks somewhere but maybe this can wait
>> until after part 2.

It's hard to see from the patch-context, but there is a section titled
"TASKS" that lists the 'gc' and 'commit-graph' tasks from the earlier
patches. Would a reference to that section be helpful?

	See the TASKS section for the list of available tasks.

>>> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>>>  struct maintenance_task {
>>>  	const char *name;
>>>  	maintenance_task_fn *fn;
>>> -	unsigned enabled:1;
>>> +	unsigned enabled:1,
>>> +		 selected:1;
>>> +	int selected_order;
>>>  };
>>
>> "selected" and "selected_order" are redundant in some cases - I think
>> this would be better if selected_order is negative if this task is not
>> selected, and non-negative otherwise.
> 
> It is good to get rid of redundancies.
> 
>> Apart from that, maybe this should be documented. It is unusual (to me)
>> that a selection can override something being enabled, but that is the
>> case here.
> 
> I had the same reaction as "(to me)" above during an earlier review
> round, IIRC.  This definitely deserves documentation.

Will do. Thanks.

-Stolee
 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

> On 8/18/2020 8:36 PM, Junio C Hamano wrote:
> > Jonathan Tan <jonathantanmy@google.com> writes:
> > 
> >>> @@ -66,6 +68,10 @@ OPTIONS
> >>>  --quiet::
> >>>  	Do not report progress or other information over `stderr`.
> >>>  
> >>> +--task=<task>::
> >>> +	If this option is specified one or more times, then only run the
> >>> +	specified tasks in the specified order.
> >>
> >> We should list the accepted tasks somewhere but maybe this can wait
> >> until after part 2.
> 
> It's hard to see from the patch-context, but there is a section titled
> "TASKS" that lists the 'gc' and 'commit-graph' tasks from the earlier
> patches. Would a reference to that section be helpful?
> 
> 	See the TASKS section for the list of available tasks.

Ah, I missed that. What you have now is fine then, but a reference might
be helpful if this man page ever grows.

==================

NAME
----
git-maintenance - Run tasks to optimize Git repository data


SYNOPSIS
--------
[verse]
'git maintenance' run [<options>]


DESCRIPTION
-----------
Run tasks to optimize Git repository data, speeding up other Git commands
and reducing storage requirements for the repository.

Git commands that add repository data, such as `git add` or `git fetch`,
are optimized for a responsive user experience. These commands do not take
time to optimize the Git data, since such optimizations scale with the full
size of the repository while these user commands each perform a relatively
small action.

The `git maintenance` command provides flexibility for how to optimize the
Git repository.

SUBCOMMANDS
-----------

run::
Run one or more maintenance tasks. If one or more `--task` options
are specified, then those tasks are run in that order. Otherwise,
the tasks are determined by which `maintenance.<task>.enabled`
config options are true. By default, only `maintenance.gc.enabled`
is true.

TASKS
-----

commit-graph::
The `commit-graph` job updates the `commit-graph` files incrementally,
then verifies that the written data is correct. The incremental
write is safe to run alongside concurrent Git processes since it
will not expire `.graph` files that were in the previous
`commit-graph-chain` file. They will be deleted by a later run based
on the expiration delay.

gc::
Clean up unnecessary files and optimize the local repository. "GC"
stands for "garbage collection," but this task performs many
smaller tasks. This task can be expensive for large repositories,
as it repacks all Git objects into a single pack-file. It can also
be disruptive in some situations, as it deletes stale data. See
linkgit:git-gc[1] for more details on garbage collection in Git.

OPTIONS
-------
--auto::
When combined with the `run` subcommand, run maintenance tasks
only if certain thresholds are met. For example, the `gc` task
runs when the number of loose objects exceeds the number stored
in the `gc.auto` config setting, or when the number of pack-files
exceeds the `gc.autoPackLimit` config setting.

--quiet::
Do not report progress or other information over `stderr`.

--task=<task>::
If this option is specified one or more times, then only run the
specified tasks in the specified order. If no `--task=<task>`
arguments are specified, then only the tasks with
`maintenance.<task>.enabled` configured as `true` are considered.
See the 'TASKS' section for the list of accepted `<task>` values.

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
int cmd_ls_remote(int argc, const char **argv, const char *prefix);
int cmd_mailinfo(int argc, const char **argv, const char *prefix);
int cmd_mailsplit(int argc, const char **argv, const char *prefix);
int cmd_maintenance(int argc, const char **argv, const char *prefix);
int cmd_merge(int argc, const char **argv, const char *prefix);
int cmd_merge_base(int argc, const char **argv, const char *prefix);
int cmd_merge_index(int argc, const char **argv, const char *prefix);
Expand Down
2 changes: 1 addition & 1 deletion builtin/am.c
Original file line number Diff line number Diff line change
Expand Up @@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
if (!state->rebasing) {
am_destroy(state);
close_object_store(the_repository->objects);
run_auto_gc(state->quiet);
run_auto_maintenance(state->quiet);
}
}

Expand Down
2 changes: 1 addition & 1 deletion builtin/commit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
git_test_write_commit_graph_or_die();

repo_rerere(the_repository, 0);
run_auto_gc(quiet);
run_auto_maintenance(quiet);
run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
if (amend && !no_post_rewrite) {
commit_post_rewrite(the_repository, current_head, &oid);
Expand Down
6 changes: 4 additions & 2 deletions builtin/fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -199,8 +199,10 @@ static struct option builtin_fetch_options[] = {
OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
N_("report that we have only objects reachable from this object")),
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
N_("run 'maintenance --auto' after fetching")),
OPT_BOOL(0, "auto-gc", &enable_auto_gc,
N_("run 'gc --auto' after fetching")),
N_("run 'maintenance --auto' after fetching")),
OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
N_("check for forced-updates on all updated branches")),
OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
Expand Down Expand Up @@ -1891,7 +1893,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
close_object_store(the_repository->objects);

if (enable_auto_gc)
run_auto_gc(verbosity < 0);
run_auto_maintenance(verbosity < 0);

return result;
}
Loading