Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 'feature.*' config area and some centralized config parsing #292

Closed

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Jul 22, 2019

This is a brand-new thread to replace ds/early-access. The discussion on that
thread was very helpful [1].

With this in mind, I propose instead a set of "feature.*" config settings
that form groups of "community recommended" settings (with some caveats).
In the space below, I'll list a set of possible feature names and the
implied config options.

First, the main two categories we've discussed so far: many commits and
many files. These two feature sets are for when your repo is large in
one of these dimensions. Perhaps there are other settings to include
in these?

feature.manyFiles:
	index.version = 4
	core.untrackedCache = true

UPDATE: Instead of "feature.manyCommits", this series includes making
core.commitGraph and gc.writeCommitGraph true by default!

The other category that has been discussed already is that of "experimental
features that we generally think are helpful but change behavior slightly in
some cases".

feature.experimental:
	pack.useSparse = true
	merge.directoryRenames = true
	fetch.negotiationAlgorithm = skipping

Specifically, this setting is for config values we are not sure will ever
be on by default, but additional testing is needed to be sure. This
is different than a possible 'feature.preview' setting that would include
config settings that we are committed to updating the defaults in
a future release. There are many ways we can take this idea in the
future (including more additions to these categories).

Thanks,
-Stolee

Updates in V2: I'm responding to Dscho's comments early, as they
were very valuable.

  • Rearranged how we are checking for the feature.* settings, so if
    one layer turns the setting on and a later layer turns it off, we do
    not adjust the defaults.

  • Switched to using enums for the non-boolean config options.

  • Placed the repo_settings struct directly in the repository struct.

  • All struct members are initialized to -1 using memset.

  • The config changes around directory rename detection is greatly simplified.

Updates in V3: These are more Dscho comments, and are quite small.

  • Remove repo-settings.h by placing the struct and function definitions in repository.h.

  • Move the initialization int into "struct repo_settings".

Updates in V4:

  • Remove feature.manyCommits in favor of turning commit-graph on by default.

  • Update docs and 0 constant in feature.experimental around merge.directoryRenames.

[1] https://public-inbox.org/git/pull.254.git.gitgitgadget@gmail.com/

Cc: Johannes.Schindelin@gmx.de, peff@peff.net, jnareb@gmail.com, pclouds@gmail.com, carenas@gmail.com, avarab@gmail.com
cc: "brian m. carlson" sandals@crustytoothpaste.net
cc: Phillip Wood phillip.wood123@gmail.com

@derrickstolee derrickstolee force-pushed the repo-settings/head branch 2 times, most recently from bbe9637 to da76859 Compare July 22, 2019 17:26
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 22, 2019

Submitted as pull.292.git.gitgitgadget@gmail.com

@@ -964,6 +964,7 @@ LIB_OBJS += refspec.o
LIB_OBJS += ref-filter.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index c18efadda5..243be2907b 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -27,6 +27,7 @@
>  #include "pack-objects.h"
>  #include "blob.h"
>  #include "tree.h"
> +#include "repo-settings.h"
>
>  #define FAILED_RUN "failed to run %s"
>
> @@ -41,7 +42,6 @@ static int aggressive_depth =3D 50;
>  static int aggressive_window =3D 250;
>  static int gc_auto_threshold =3D 6700;
>  static int gc_auto_pack_limit =3D 50;
> -static int gc_write_commit_graph;

I _really_ like that direction. Anything that removes global state will
improve Git's source code.

> [...]
> diff --git a/read-cache.c b/read-cache.c
> index c701f7f8b8..ee1aaa8917 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> [...]
> @@ -2765,7 +2767,7 @@ static int do_write_index(struct index_state *ista=
te, struct tempfile *tempfile,
>  	}
>
>  	if (!istate->version) {
> -		istate->version =3D get_index_format_default();
> +		istate->version =3D get_index_format_default(the_repository);

It is too bad that `read-cache.h` is not `the_repository`-free at the
moment...

>  		if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
>  			init_split_index(istate);
>  	}
> diff --git a/repo-settings.c b/repo-settings.c
> new file mode 100644
> index 0000000000..13a9128f62
> --- /dev/null
> +++ b/repo-settings.c
> @@ -0,0 +1,44 @@
> +#include "cache.h"
> +#include "repository.h"
> +#include "config.h"
> +#include "repo-settings.h"
> +
> +static int git_repo_config(const char *key, const char *value, void *cb=
)
> +{
> +	struct repo_settings *rs =3D (struct repo_settings *)cb;
> +
> +	if (!strcmp(key, "core.commitgraph")) {
> +		rs->core_commit_graph =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "gc.writecommitgraph")) {
> +		rs->gc_write_commit_graph =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "pack.usesparse")) {
> +		rs->pack_use_sparse =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "index.version")) {
> +		rs->index_version =3D git_config_int(key, value);
> +		return 0;
> +	}

I would actually prefer to use the `repo_config_get_*()` family here.
That way, we really avoid re-parsing the config.

> +
> +	return 1;
> +}
> +
> +void prepare_repo_settings(struct repository *r)
> +{
> +	if (r->settings)
> +		return;
> +
> +	r->settings =3D xmalloc(sizeof(*r->settings));
> +
> +	/* Defaults */
> +	r->settings->core_commit_graph =3D -1;
> +	r->settings->gc_write_commit_graph =3D -1;
> +	r->settings->pack_use_sparse =3D -1;
> +	r->settings->index_version =3D -1;
> +
> +	repo_config(r, git_repo_config, r->settings);
> +}
> diff --git a/repo-settings.h b/repo-settings.h
> new file mode 100644
> index 0000000000..1151c2193a
> --- /dev/null
> +++ b/repo-settings.h
> @@ -0,0 +1,15 @@
> +#ifndef REPO_SETTINGS_H
> +#define REPO_SETTINGS_H
> +
> +struct repo_settings {
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +	int pack_use_sparse;
> +	int index_version;
> +};
> +
> +struct repository;
> +
> +void prepare_repo_settings(struct repository *r);

Hmm. I can see that you wanted to encapsulate this, but I do not really
agree that this needs to be encapsulated away from `repository.h`. I'd
rather declare `struct repo_settings` in `repository.h` and then make
the `settings` a field of that type (as opposed to a pointer to that
type). In general, I like to avoid unnecessary `malloc()`s, and this
here instance is one of them.

Thanks,
Dscho

> +
> +#endif /* REPO_SETTINGS_H */
> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..352afc9cd8 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  #include "path.h"
>
>  struct config_set;
> +struct repo_settings;
>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -72,6 +73,8 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>
> +	struct repo_settings *settings;
> +
>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> --
> gitgitgadget
>
>

@@ -345,6 +345,8 @@ include::config/difftool.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> When a repo has many commits, it is helpful to write and read the
> commit-graph file. Future changes to Git may include new config
> settings that are benefitial in this scenario.

s/benefitial/beneficial/

>
> Create the 'feature.manyCommits' config setting that changes the
> default values of 'core.commitGraph' and 'gc.writeCommitGraph' to
> true.

Great!

> diff --git a/repo-settings.c b/repo-settings.c
> index 13a9128f62..f328602fd7 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -3,10 +3,17 @@
>  #include "config.h"
>  #include "repo-settings.h"
>
> +#define UPDATE_DEFAULT(s,v) do { if (s =3D=3D -1) { s =3D v; } } while(=
0)
> +
>  static int git_repo_config(const char *key, const char *value, void *cb=
)
>  {
>  	struct repo_settings *rs =3D (struct repo_settings *)cb;
>
> +	if (!strcmp(key, "feature.manycommits")) {
> +		UPDATE_DEFAULT(rs->core_commit_graph, 1);
> +		UPDATE_DEFAULT(rs->gc_write_commit_graph, 1);
> +		return 0;
> +	}
>  	if (!strcmp(key, "core.commitgraph")) {
>  		rs->core_commit_graph =3D git_config_bool(key, value);
>  		return 0;

Okay, this one is tricky. The behavior I would want is for
`feature.manycommits` to override the _default_. And if I set
`feature.manycommits =3D false` (e.g. via `git -c
feature.manycommits=3Dfalse ...`), I would want the default to "go back".

So I'd really rather see this as

	if (!repo_config_get_bool(r, "feature.manycommits", &b) && b) {
		rs->core_commit_graph =3D 1;
		rs->gc_write_commit_graph =3D 1;
	}

	[...]

	repo_config_get_bool(r, "core.commitgraph", &rs->core_commit_graph);

I.e. override the default only if `feature.manyCommits` was true *in the
end*.

In any case, we really need to make sure to handle the `=3D false` case
correctly here. We might want to override the setting from the
command-line, or it might be set in `~/.gitconfig` and need to be
overridden in a local repository. We need to handle it.

Otherwise this looks good!
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2019 10:53 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When a repo has many commits, it is helpful to write and read the
>> commit-graph file. Future changes to Git may include new config
>> settings that are benefitial in this scenario.
> 
> s/benefitial/beneficial/
> 
>>
>> Create the 'feature.manyCommits' config setting that changes the
>> default values of 'core.commitGraph' and 'gc.writeCommitGraph' to
>> true.
> 
> Great!
> 
>> diff --git a/repo-settings.c b/repo-settings.c
>> index 13a9128f62..f328602fd7 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -3,10 +3,17 @@
>>  #include "config.h"
>>  #include "repo-settings.h"
>>
>> +#define UPDATE_DEFAULT(s,v) do { if (s == -1) { s = v; } } while(0)
>> +
>>  static int git_repo_config(const char *key, const char *value, void *cb)
>>  {
>>  	struct repo_settings *rs = (struct repo_settings *)cb;
>>
>> +	if (!strcmp(key, "feature.manycommits")) {
>> +		UPDATE_DEFAULT(rs->core_commit_graph, 1);
>> +		UPDATE_DEFAULT(rs->gc_write_commit_graph, 1);
>> +		return 0;
>> +	}
>>  	if (!strcmp(key, "core.commitgraph")) {
>>  		rs->core_commit_graph = git_config_bool(key, value);
>>  		return 0;
> 
> Okay, this one is tricky. The behavior I would want is for
> `feature.manycommits` to override the _default_. And if I set
> `feature.manycommits = false` (e.g. via `git -c
> feature.manycommits=false ...`), I would want the default to "go back".
> 
> So I'd really rather see this as
> 
> 	if (!repo_config_get_bool(r, "feature.manycommits", &b) && b) {
> 		rs->core_commit_graph = 1;
> 		rs->gc_write_commit_graph = 1;
> 	}
> 
> 	[...]
> 
> 	repo_config_get_bool(r, "core.commitgraph", &rs->core_commit_graph);
> 
> I.e. override the default only if `feature.manyCommits` was true *in the
> end*.
> 
> In any case, we really need to make sure to handle the `= false` case
> correctly here. We might want to override the setting from the
> command-line, or it might be set in `~/.gitconfig` and need to be
> overridden in a local repository. We need to handle it.

I had forgotten about this interaction. Thanks for finding it!

The way I would fix it is to add an "int feature_many_commits" to
the repo_settings struct, then check its value at the very end of
prepare_repo_settings() and then UPDATE_DEFAULT() for the settings
we want.

Thanks!
-Stolee

@@ -18,6 +18,7 @@
#include "dir.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The core.untrackedCache config setting is slightly complicated,
> so clarify its use and centralize its parsing into the repo
> settings.
>
> The default value is "keep" (returned as -1), which persists the
> untracked cache if it exists.
>
> If the value is set as "false" (returned as 0), then remove the
> untracked cache if it exists.
>
> If the value is set as "true" (returned as 1), then write the
> untracked cache and persist it.
>
> Instead of relying on magic values of -1, 0, and 1, split these
> options into bitflags CORE_UNTRACKED_CACHE_KEEP and
> CORE_UNTRACKED_CACHE_WRITE. This allows the use of "-1" as a
> default value. After parsing the config options, if the value is
> unset we can initialize it to CORE_UNTRACKED_CACHE_KEEP.

Nice!

> [...]
> diff --git a/read-cache.c b/read-cache.c
> index ee1aaa8917..e67e6f6e3e 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1846,18 +1846,17 @@ static void check_ce_order(struct index_state *i=
state)
>
>  static void tweak_untracked_cache(struct index_state *istate)
>  {
> -	switch (git_config_get_untracked_cache()) {
> -	case -1: /* keep: do nothing */
> -		break;
> -	case 0: /* false */
> +	struct repository *r =3D the_repository;
> +
> +	prepare_repo_settings(r);
> +
> +	if (!(r->settings->core_untracked_cache & CORE_UNTRACKED_CACHE_KEEP)) =
{
>  		remove_untracked_cache(istate);
> -		break;
> -	case 1: /* true */
> -		add_untracked_cache(istate);
> -		break;
> -	default: /* unknown value: do nothing */
> -		break;
> +		return;
>  	}
> +
> +	if (r->settings->core_untracked_cache & CORE_UNTRACKED_CACHE_WRITE)
> +		add_untracked_cache(istate);

This changes the flow in a subtle way: in the
`CORE_UNTRACKED_CACHE_WRITE` case, we used to _not_ remove the untracked
cache, but now we do.

I _think_ what you would want to do is replace the `!(..._KEEP)`
condition by `..._REMOVE`.

>  }
>
>  static void tweak_split_index(struct index_state *istate)
> diff --git a/repo-settings.c b/repo-settings.c
> index f328602fd7..807c5a29d6 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -30,6 +30,20 @@ static int git_repo_config(const char *key, const cha=
r *value, void *cb)
>  		rs->index_version =3D git_config_int(key, value);
>  		return 0;
>  	}
> +	if (!strcmp(key, "core.untrackedcache")) {
> +		int bool_value =3D git_parse_maybe_bool(value);
> +		if (bool_value =3D=3D 0)
> +			rs->core_untracked_cache =3D 0;
> +		else if (bool_value =3D=3D 1)
> +			rs->core_untracked_cache =3D CORE_UNTRACKED_CACHE_KEEP |
> +						   CORE_UNTRACKED_CACHE_WRITE;
> +		else if (!strcasecmp(value, "keep"))
> +			rs->core_untracked_cache =3D CORE_UNTRACKED_CACHE_KEEP;
> +		else
> +			error(_("unknown core.untrackedCache value '%s'; "
> +				"using 'keep' default value"), value);
> +		return 0;
> +	}
>
>  	return 1;
>  }
> @@ -46,6 +60,13 @@ void prepare_repo_settings(struct repository *r)
>  	r->settings->gc_write_commit_graph =3D -1;
>  	r->settings->pack_use_sparse =3D -1;
>  	r->settings->index_version =3D -1;
> +	r->settings->core_untracked_cache =3D -1;

At this point at the latest, I am starting to wonder whether it would
not make more sense to use `memset(..., -1, sizeof(struct
repo_settings)` instead.

>
>  	repo_config(r, git_repo_config, r->settings);
> +
> +	/* Hack for test programs like test-dump-untracked-cache */
> +	if (ignore_untracked_cache_config)
> +		r->settings->core_untracked_cache =3D CORE_UNTRACKED_CACHE_KEEP;
> +	else
> +		UPDATE_DEFAULT(r->settings->core_untracked_cache, CORE_UNTRACKED_CACH=
E_KEEP);
>  }
> diff --git a/repo-settings.h b/repo-settings.h
> index 1151c2193a..bac9b87d49 100644
> --- a/repo-settings.h
> +++ b/repo-settings.h
> @@ -1,11 +1,15 @@
>  #ifndef REPO_SETTINGS_H
>  #define REPO_SETTINGS_H
>
> +#define CORE_UNTRACKED_CACHE_WRITE (1 << 0)
> +#define CORE_UNTRACKED_CACHE_KEEP (1 << 1)

I think it would read even nicer as an enum. In any case, using `1<<1`
suggests that this is a bit field, but I don't think that is what we
actually want here. Instead, what `core_untracked_cache` seems to be (at
least from my point of view) is a mode, where any two modes are mutually
exclusive.

For example, what is the difference between `(_KEEP | _WRITE)` and
`(_WRITE)` supposed to be? That the latter writes a fresh untracked
cache without looking at the previous one? That's not how the existing
code behaves, though...

Ciao,
Dscho

> +
>  struct repo_settings {
>  	int core_commit_graph;
>  	int gc_write_commit_graph;
>  	int pack_use_sparse;
>  	int index_version;
> +	int core_untracked_cache;
>  };
>
>  struct repository;
> --
> gitgitgadget
>
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):



On 7/23/2019 11:04 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The core.untrackedCache config setting is slightly complicated,
>> so clarify its use and centralize its parsing into the repo
>> settings.
>>
>> The default value is "keep" (returned as -1), which persists the
>> untracked cache if it exists.
>>
>> If the value is set as "false" (returned as 0), then remove the
>> untracked cache if it exists.
>>
>> If the value is set as "true" (returned as 1), then write the
>> untracked cache and persist it.
>>
>> Instead of relying on magic values of -1, 0, and 1, split these
>> options into bitflags CORE_UNTRACKED_CACHE_KEEP and
>> CORE_UNTRACKED_CACHE_WRITE. This allows the use of "-1" as a
>> default value. After parsing the config options, if the value is
>> unset we can initialize it to CORE_UNTRACKED_CACHE_KEEP.
> 
> Nice!
> 
>> [...]
>> diff --git a/read-cache.c b/read-cache.c
>> index ee1aaa8917..e67e6f6e3e 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1846,18 +1846,17 @@ static void check_ce_order(struct index_state *istate)
>>
>>  static void tweak_untracked_cache(struct index_state *istate)
>>  {
>> -	switch (git_config_get_untracked_cache()) {
>> -	case -1: /* keep: do nothing */
>> -		break;
>> -	case 0: /* false */
>> +	struct repository *r = the_repository;
>> +
>> +	prepare_repo_settings(r);
>> +
>> +	if (!(r->settings->core_untracked_cache & CORE_UNTRACKED_CACHE_KEEP)) {
>>  		remove_untracked_cache(istate);
>> -		break;
>> -	case 1: /* true */
>> -		add_untracked_cache(istate);
>> -		break;
>> -	default: /* unknown value: do nothing */
>> -		break;
>> +		return;
>>  	}
>> +
>> +	if (r->settings->core_untracked_cache & CORE_UNTRACKED_CACHE_WRITE)
>> +		add_untracked_cache(istate);
> 
> This changes the flow in a subtle way: in the
> `CORE_UNTRACKED_CACHE_WRITE` case, we used to _not_ remove the untracked
> cache, but now we do.
> 
> I _think_ what you would want to do is replace the `!(..._KEEP)`
> condition by `..._REMOVE`.

I believe the code as written is correct, but confusing. The value is not an enum, but instead a bitflag. When the config setting is given as "true", then both _KEEP and _WRITE are set, so the flow is identical.

However, you already suggested switching to an enum, in which case using _REMOVE would be clearer.

> 
>>  }
>>
>>  static void tweak_split_index(struct index_state *istate)
>> diff --git a/repo-settings.c b/repo-settings.c
>> index f328602fd7..807c5a29d6 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -30,6 +30,20 @@ static int git_repo_config(const char *key, const char *value, void *cb)
>>  		rs->index_version = git_config_int(key, value);
>>  		return 0;
>>  	}
>> +	if (!strcmp(key, "core.untrackedcache")) {
>> +		int bool_value = git_parse_maybe_bool(value);
>> +		if (bool_value == 0)
>> +			rs->core_untracked_cache = 0;
>> +		else if (bool_value == 1)
>> +			rs->core_untracked_cache = CORE_UNTRACKED_CACHE_KEEP |
>> +						   CORE_UNTRACKED_CACHE_WRITE;
>> +		else if (!strcasecmp(value, "keep"))
>> +			rs->core_untracked_cache = CORE_UNTRACKED_CACHE_KEEP;
>> +		else
>> +			error(_("unknown core.untrackedCache value '%s'; "
>> +				"using 'keep' default value"), value);
>> +		return 0;
>> +	}
>>
>>  	return 1;
>>  }
>> @@ -46,6 +60,13 @@ void prepare_repo_settings(struct repository *r)
>>  	r->settings->gc_write_commit_graph = -1;
>>  	r->settings->pack_use_sparse = -1;
>>  	r->settings->index_version = -1;
>> +	r->settings->core_untracked_cache = -1;
> 
> At this point at the latest, I am starting to wonder whether it would
> not make more sense to use `memset(..., -1, sizeof(struct
> repo_settings)` instead.
> 
>>
>>  	repo_config(r, git_repo_config, r->settings);
>> +
>> +	/* Hack for test programs like test-dump-untracked-cache */
>> +	if (ignore_untracked_cache_config)
>> +		r->settings->core_untracked_cache = CORE_UNTRACKED_CACHE_KEEP;
>> +	else
>> +		UPDATE_DEFAULT(r->settings->core_untracked_cache, CORE_UNTRACKED_CACHE_KEEP);
>>  }
>> diff --git a/repo-settings.h b/repo-settings.h
>> index 1151c2193a..bac9b87d49 100644
>> --- a/repo-settings.h
>> +++ b/repo-settings.h
>> @@ -1,11 +1,15 @@
>>  #ifndef REPO_SETTINGS_H
>>  #define REPO_SETTINGS_H
>>
>> +#define CORE_UNTRACKED_CACHE_WRITE (1 << 0)
>> +#define CORE_UNTRACKED_CACHE_KEEP (1 << 1)
> 
> I think it would read even nicer as an enum. In any case, using `1<<1`
> suggests that this is a bit field, but I don't think that is what we
> actually want here. Instead, what `core_untracked_cache` seems to be (at
> least from my point of view) is a mode, where any two modes are mutually
> exclusive.
> 
> For example, what is the difference between `(_KEEP | _WRITE)` and
> `(_WRITE)` supposed to be? That the latter writes a fresh untracked
> cache without looking at the previous one? That's not how the existing
> code behaves, though...

Yes, there is no reason to have _WRITE without also _KEEP. An enum is better. Thanks!

> 
> Ciao,
> Dscho
> 
>> +
>>  struct repo_settings {
>>  	int core_commit_graph;
>>  	int gc_write_commit_graph;
>>  	int pack_use_sparse;
>>  	int index_version;
>> +	int core_untracked_cache;
>>  };
>>
>>  struct repository;
>> --
>> gitgitgadget
>>
>>

@@ -0,0 +1,42 @@
feature.*::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'feature.experimental' setting includes config options that are
> not committed to become defaults, but could use additional testing.
>
> Update the following config settings to take new defaults, and to
> use the repo_settings struct if not already using them:
>
> * 'pack.useSparse=3Dtrue'
>
> * 'merge.directoryRenames=3Dtrue'
>
> * 'fetch.negotiationAlgorithm=3Dskipping'
>
> In the case of fetch.negotiationAlgorithm, the existing logic
> would load the config option only when about to use the setting,
> so had a die() statement on an unknown string value. This is
> removed as now the config is parsed under prepare_repo_settings().
> In general, this die() is probably misplaced and not valuable.
> A test was removed that checked this die() statement executed.

Good.

> [...]
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 12300131fc..162d5a4753 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -28,6 +28,7 @@
>  #include "submodule.h"
>  #include "revision.h"
>  #include "commit-reach.h"
> +#include "repo-settings.h"
>
>  struct path_hashmap_entry {
>  	struct hashmap_entry e;
> @@ -1375,10 +1376,14 @@ static int handle_rename_via_dir(struct merge_op=
tions *opt,
>  	 * there is no content merge to do; just move the file into the
>  	 * desired final location.
>  	 */
> +	struct repository *r =3D the_repository;
>  	const struct rename *ren =3D ci->ren1;
>  	const struct diff_filespec *dest =3D ren->pair->two;
>  	char *file_path =3D dest->path;
> -	int mark_conflicted =3D (opt->detect_directory_renames =3D=3D 1);

I actually don't think that we want to do that; the `opt` parameter is
passed to the merge recursive algorithm specifically so that it can be
overridden by the caller.

Instead, what we should do, I think, is to change `init_merge_options()`
(which already gets a parameter of type `struct repository *`) so that
it does not hard-code the `detect_directory_renames` value to `1` but to
query the repo settings instead.

> +	int mark_conflicted;
> +
> +	prepare_repo_settings(r);
> +	mark_conflicted =3D (r->settings->merge_directory_renames =3D=3D MERGE=
_DIRECTORY_RENAMES_CONFLICT);
>  	assert(ren->dir_rename_original_dest);
>
>  	if (!opt->call_depth && would_lose_untracked(opt, dest->path)) {
> @@ -2850,6 +2855,7 @@ static int detect_and_process_renames(struct merge=
_options *opt,
>  				      struct string_list *entries,
>  				      struct rename_info *ri)
>  {
> +	struct repository *r =3D the_repository;
>  	struct diff_queue_struct *head_pairs, *merge_pairs;
>  	struct hashmap *dir_re_head, *dir_re_merge;
>  	int clean =3D 1;
> @@ -2863,7 +2869,8 @@ static int detect_and_process_renames(struct merge=
_options *opt,
>  	head_pairs =3D get_diffpairs(opt, common, head);
>  	merge_pairs =3D get_diffpairs(opt, common, merge);
>
> -	if (opt->detect_directory_renames) {
> +	prepare_repo_settings(r);
> +	if (r->settings->merge_directory_renames) {
>  		dir_re_head =3D get_directory_renames(head_pairs);
>  		dir_re_merge =3D get_directory_renames(merge_pairs);
>
> @@ -3112,6 +3119,7 @@ static int handle_rename_normal(struct merge_optio=
ns *opt,
>  				const struct diff_filespec *b,
>  				struct rename_conflict_info *ci)
>  {
> +	struct repository *r =3D the_repository;
>  	struct rename *ren =3D ci->ren1;
>  	struct merge_file_info mfi;
>  	int clean;
> @@ -3121,7 +3129,9 @@ static int handle_rename_normal(struct merge_optio=
ns *opt,
>  	clean =3D handle_content_merge(&mfi, opt, path, was_dirty(opt, path),
>  				     o, a, b, ci);
>
> -	if (clean && opt->detect_directory_renames =3D=3D 1 &&
> +	prepare_repo_settings(r);
> +	if (clean &&
> +	    r->settings->merge_directory_renames =3D=3D MERGE_DIRECTORY_RENAME=
S_CONFLICT &&
>  	    ren->dir_rename_original_dest) {
>  		if (update_stages(opt, path,
>  				  NULL,
> @@ -3155,6 +3165,7 @@ static void dir_rename_warning(const char *msg,
>  static int warn_about_dir_renamed_entries(struct merge_options *opt,
>  					  struct rename *ren)
>  {
> +	struct repository *r =3D the_repository;
>  	const char *msg;
>  	int clean =3D 1, is_add;
>
> @@ -3166,12 +3177,13 @@ static int warn_about_dir_renamed_entries(struct=
 merge_options *opt,
>  		return clean;
>
>  	/* Sanity checks */
> -	assert(opt->detect_directory_renames > 0);
> +	prepare_repo_settings(r);
> +	assert(r->settings->merge_directory_renames > 0);
>  	assert(ren->dir_rename_original_type =3D=3D 'A' ||
>  	       ren->dir_rename_original_type =3D=3D 'R');
>
>  	/* Check whether to treat directory renames as a conflict */
> -	clean =3D (opt->detect_directory_renames =3D=3D 2);
> +	clean =3D (r->settings->merge_directory_renames =3D=3D MERGE_DIRECTORY=
_RENAMES_TRUE);
>
>  	is_add =3D (ren->dir_rename_original_type =3D=3D 'A');
>  	if (ren->dir_rename_original_type =3D=3D 'A' && clean) {
> @@ -3662,15 +3674,6 @@ static void merge_recursive_config(struct merge_o=
ptions *opt)
>  		opt->merge_detect_rename =3D git_config_rename("merge.renames", value=
);
>  		free(value);
>  	}
> -	if (!git_config_get_string("merge.directoryrenames", &value)) {
> -		int boolval =3D git_parse_maybe_bool(value);
> -		if (0 <=3D boolval) {
> -			opt->detect_directory_renames =3D boolval ? 2 : 0;
> -		} else if (!strcasecmp(value, "conflict")) {
> -			opt->detect_directory_renames =3D 1;
> -		} /* avoid erroring on values from future versions of git */
> -		free(value);
> -	}
>  	git_config(git_xmerge_config, NULL);
>  }
>
> @@ -3687,7 +3690,6 @@ void init_merge_options(struct merge_options *opt,
>  	opt->renormalize =3D 0;
>  	opt->diff_detect_rename =3D -1;
>  	opt->merge_detect_rename =3D -1;
> -	opt->detect_directory_renames =3D 1;

In other words: here.

>  	merge_recursive_config(opt);
>  	merge_verbosity =3D getenv("GIT_MERGE_VERBOSITY");
>  	if (merge_verbosity)
> diff --git a/merge-recursive.h b/merge-recursive.h
> index c2b7bb65c6..b8eba244ee 100644
> --- a/merge-recursive.h
> +++ b/merge-recursive.h
> @@ -22,7 +22,6 @@ struct merge_options {
>  	unsigned renormalize : 1;
>  	long xdl_opts;
>  	int verbosity;
> -	int detect_directory_renames;
>  	int diff_detect_rename;
>  	int merge_detect_rename;
>  	int diff_rename_limit;
> diff --git a/repo-settings.c b/repo-settings.c
> index 9e4b8e6268..5e9249c437 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -9,6 +9,12 @@ static int git_repo_config(const char *key, const char =
*value, void *cb)
>  {
>  	struct repo_settings *rs =3D (struct repo_settings *)cb;
>
> +	if (!strcmp(key, "feature.experimental")) {
> +		UPDATE_DEFAULT(rs->pack_use_sparse, 1);
> +		UPDATE_DEFAULT(rs->merge_directory_renames, MERGE_DIRECTORY_RENAMES_T=
RUE);
> +		UPDATE_DEFAULT(rs->fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKI=
PPING);
> +		return 0;
> +	}
>  	if (!strcmp(key, "feature.manycommits")) {
>  		UPDATE_DEFAULT(rs->core_commit_graph, 1);
>  		UPDATE_DEFAULT(rs->gc_write_commit_graph, 1);
> @@ -50,6 +56,23 @@ static int git_repo_config(const char *key, const cha=
r *value, void *cb)
>  				"using 'keep' default value"), value);
>  		return 0;
>  	}
> +	if (!strcmp(key, "merge.directoryrenames")) {
> +		int bool_value =3D git_parse_maybe_bool(value);
> +		if (0 <=3D bool_value) {
> +			rs->merge_directory_renames =3D bool_value ? MERGE_DIRECTORY_RENAMES=
_TRUE : 0;
> +		} else if (!strcasecmp(value, "conflict")) {
> +			rs->merge_directory_renames =3D MERGE_DIRECTORY_RENAMES_CONFLICT;
> +		}
> +		return 0;
> +	}
> +	if (!strcmp(key, "fetch.negotiationalgorithm"))	{
> +		if (!strcasecmp(value, "skipping")) {
> +			rs->fetch_negotiation_algorithm =3D FETCH_NEGOTIATION_SKIPPING;
> +		} else {
> +			rs->fetch_negotiation_algorithm =3D FETCH_NEGOTIATION_DEFAULT;
> +		}
> +		return 0;
> +	}
>
>  	return 1;
>  }
> @@ -64,10 +87,14 @@ void prepare_repo_settings(struct repository *r)
>  	/* Defaults */
>  	r->settings->core_commit_graph =3D -1;
>  	r->settings->gc_write_commit_graph =3D -1;
> -	r->settings->pack_use_sparse =3D -1;
> +
>  	r->settings->index_version =3D -1;
>  	r->settings->core_untracked_cache =3D -1;
>
> +	r->settings->pack_use_sparse =3D -1;

This reordering at this stage in the patch series is a bit confusing.
Maybe add it at the location where you want it to end up to begin with?

Or use `memset(..., -1, ...)`.

> +	r->settings->merge_directory_renames =3D -1;
> +	r->settings->fetch_negotiation_algorithm =3D -1;
> +
>  	repo_config(r, git_repo_config, r->settings);
>
>  	/* Hack for test programs like test-dump-untracked-cache */
> @@ -75,4 +102,7 @@ void prepare_repo_settings(struct repository *r)
>  		r->settings->core_untracked_cache =3D CORE_UNTRACKED_CACHE_KEEP;
>  	else
>  		UPDATE_DEFAULT(r->settings->core_untracked_cache, CORE_UNTRACKED_CACH=
E_KEEP);
> +
> +	UPDATE_DEFAULT(r->settings->merge_directory_renames, MERGE_DIRECTORY_R=
ENAMES_CONFLICT);
> +	UPDATE_DEFAULT(r->settings->fetch_negotiation_algorithm, FETCH_NEGOTIA=
TION_DEFAULT);
>  }
> diff --git a/repo-settings.h b/repo-settings.h
> index bac9b87d49..cecf7d0028 100644
> --- a/repo-settings.h
> +++ b/repo-settings.h
> @@ -4,12 +4,22 @@
>  #define CORE_UNTRACKED_CACHE_WRITE (1 << 0)
>  #define CORE_UNTRACKED_CACHE_KEEP (1 << 1)
>
> +#define MERGE_DIRECTORY_RENAMES_CONFLICT 1
> +#define MERGE_DIRECTORY_RENAMES_TRUE 2
> +
> +#define FETCH_NEGOTIATION_DEFAULT 1
> +#define FETCH_NEGOTIATION_SKIPPING 2
> +

Again, I'd prefer enums.

The rest looks pretty fine to me (I have to admit that I only glanced
over the tests, though).

Thanks!
Dscho

>  struct repo_settings {
>  	int core_commit_graph;
>  	int gc_write_commit_graph;
> -	int pack_use_sparse;
> +
>  	int index_version;
>  	int core_untracked_cache;
> +
> +	int pack_use_sparse;
> +	int merge_directory_renames;
> +	int fetch_negotiation_algorithm;
>  };
>
>  struct repository;
> diff --git a/t/t5552-skipping-fetch-negotiator.sh b/t/t5552-skipping-fet=
ch-negotiator.sh
> index 8a14be51a1..f70cbcc9ca 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -60,29 +60,6 @@ test_expect_success 'commits with no parents are sent=
 regardless of skip distanc
>  	have_not_sent c6 c4 c3
>  '
>
> -test_expect_success 'unknown fetch.negotiationAlgorithm values error ou=
t' '
> -	rm -rf server client trace &&
> -	git init server &&
> -	test_commit -C server to_fetch &&
> -
> -	git init client &&
> -	test_commit -C client on_client &&
> -	git -C client checkout on_client &&
> -
> -	test_config -C client fetch.negotiationAlgorithm invalid &&
> -	test_must_fail git -C client fetch "$(pwd)/server" 2>err &&
> -	test_i18ngrep "unknown fetch negotiation algorithm" err &&
> -
> -	# Explicit "default" value
> -	test_config -C client fetch.negotiationAlgorithm default &&
> -	git -C client -c fetch.negotiationAlgorithm=3Ddefault fetch "$(pwd)/se=
rver" &&
> -
> -	# Implementation detail: If there is nothing to fetch, we will not err=
or out
> -	test_config -C client fetch.negotiationAlgorithm invalid &&
> -	git -C client fetch "$(pwd)/server" 2>err &&
> -	test_i18ngrep ! "unknown fetch negotiation algorithm" err
> -'
> -
>  test_expect_success 'when two skips collide, favor the larger one' '
>  	rm -rf server client trace &&
>  	git init server &&
> --
> gitgitgadget
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2019 11:20 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:
>>  struct path_hashmap_entry {
>>  	struct hashmap_entry e;
>> @@ -1375,10 +1376,14 @@ static int handle_rename_via_dir(struct merge_options *opt,
>>  	 * there is no content merge to do; just move the file into the
>>  	 * desired final location.
>>  	 */
>> +	struct repository *r = the_repository;
>>  	const struct rename *ren = ci->ren1;
>>  	const struct diff_filespec *dest = ren->pair->two;
>>  	char *file_path = dest->path;
>> -	int mark_conflicted = (opt->detect_directory_renames == 1);
> 
> I actually don't think that we want to do that; the `opt` parameter is
> passed to the merge recursive algorithm specifically so that it can be
> overridden by the caller.
> 
> Instead, what we should do, I think, is to change `init_merge_options()`
> (which already gets a parameter of type `struct repository *`) so that
> it does not hard-code the `detect_directory_renames` value to `1` but to
> query the repo settings instead.

[snip]
>> @@ -3687,7 +3690,6 @@ void init_merge_options(struct merge_options *opt,
>>  	opt->renormalize = 0;
>>  	opt->diff_detect_rename = -1;
>>  	opt->merge_detect_rename = -1;
>> -	opt->detect_directory_renames = 1;
> 
> In other words: here.

Yes, thanks! Your suggestion makes much more sense and makes the diff
much simpler. No reason to couple too closely to the config storage here.

I think you gave me enough feedback to merit a v2 before others chime
in, so I'll send one after I validate my current version.

Thanks,
-Stolee

@derrickstolee derrickstolee force-pushed the repo-settings/head branch 4 times, most recently from e68ba97 to b8a6316 Compare July 25, 2019 01:47
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 25, 2019

Submitted as pull.292.v2.git.gitgitgadget@gmail.com

@@ -964,6 +964,7 @@ LIB_OBJS += refspec.o
LIB_OBJS += ref-filter.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Wed, 24 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/repo-settings.h b/repo-settings.h
> new file mode 100644
> index 0000000000..89fb0159bf
> --- /dev/null
> +++ b/repo-settings.h
> @@ -0,0 +1,17 @@
> +#ifndef REPO_SETTINGS_H
> +#define REPO_SETTINGS_H
> +
> +struct repo_settings {
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +
> +	int index_version;
> +
> +	int pack_use_sparse;
> +};
> +
> +struct repository;
> +
> +void prepare_repo_settings(struct repository *r);
> +
> +#endif /* REPO_SETTINGS_H */
> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..a817486825 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -2,8 +2,10 @@
>  #define REPOSITORY_H
>
>  #include "path.h"
> +#include "repo-settings.h"

I still think that the `repo_settings` struct could just as easily be
declared in `repository.h`. No need to invent a new header file.

>
>  struct config_set;
> +struct repo_settings;

In any case, this is no longer necessary.

>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -72,6 +74,9 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>
> +	int settings_initialized;

Or maybe

	unsigned settings_initialized:1;

?

> +	struct repo_settings settings;
> +

Or maybe even fold the `initialized` flag into that struct?

Thanks,
Dscho

>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> --
> gitgitgadget
>
>

@@ -18,6 +18,7 @@
#include "dir.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Wed, 24 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index dff2f4b837..5ced51c1ee 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -18,6 +18,7 @@
>  #include "dir.h"
>  #include "split-index.h"
>  #include "fsmonitor.h"
> +#include "repo-settings.h"

Sorry, I only realized 10 seconds ago that these `#include`s are no
longer necessary because `repo-settings` is already included from
`repository.h` (and needs to be, because the `repository` struct now
contains a full `struct repo_settings` field, no longer just a pointer).

But really, I would just fold the `repo-settings.h` declarations
directly into `repository.h` (but keep `repo-settings.c`, of course!) as
it make everything a bit simpler.

Thanks,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 25, 2019

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Wed, 24 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> Updates in V2: I'm responding to Dscho's comments early, as they were ve=
ry
> valuable.
>
>  * Rearranged how we are checking for the feature.* settings, so if one
>    layer turns the setting on and a later layer turns it off, we do not
>    adjust the defaults.
>
>
>  * Switched to using enums for the non-boolean config options.
>
>
>  * Placed the repo_settings struct directly in the repository struct.
>
>
>  * All struct members are initialized to -1 using memset.
>
>
>  * The config changes around directory rename detection is greatly
>    simplified.

I left only a couple of comments, and quite honestly, I am really happy
with the direction this is going.

It even leaves the door open to let users define their custom feature
groups, at least I have a couple of ideas how that could be accomplished
building on top of `repo-settings.c`.

Thanks!
Dscho

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 30, 2019

Submitted as pull.292.v3.git.gitgitgadget@gmail.com

@@ -964,6 +964,7 @@ LIB_OBJS += refspec.o
LIB_OBJS += ref-filter.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..2bb2bc3eea 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  #include "path.h"
>  
>  struct config_set;
> +struct repo_settings;

Given that the next hunk you introduce the real thing, and nobody
refers to it until then, I do not see why we want to have a forward
declaration.

>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -11,6 +12,17 @@ struct pathspec;
>  struct raw_object_store;
>  struct submodule_cache;
>  
> +struct repo_settings {
> +	int initialized;
> +
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +
> +	int index_version;
> +
> +	int pack_use_sparse;
> +};
> +
>  struct repository {
>  	/* Environment */
>  	/*
> @@ -72,6 +84,8 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>  
> +	struct repo_settings settings;
> +
>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> @@ -157,5 +171,6 @@ int repo_read_index_unmerged(struct repository *);
>   */
>  void repo_update_index_if_able(struct repository *, struct lock_file *);
>  
> +void prepare_repo_settings(struct repository *r);
>  
>  #endif /* REPOSITORY_H */

@@ -345,6 +345,8 @@ include::config/difftool.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
> new file mode 100644
> index 0000000000..f74314ae90
> --- /dev/null
> +++ b/Documentation/config/feature.txt
> @@ -0,0 +1,15 @@
> +feature.*::
> +...
> +* `gc.writeCommitGraph=true` enables writing the commit-graph file during
> +garbage collection.
> \ No newline at end of file

No newline at end of file

> diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
> index 02b92b18b5..31a5fc4f75 100644
> --- a/Documentation/config/gc.txt
> +++ b/Documentation/config/gc.txt
> @@ -63,8 +63,8 @@ gc.writeCommitGraph::
>  	If true, then gc will rewrite the commit-graph file when
>  	linkgit:git-gc[1] is run. When using `git gc --auto`
>  	the commit-graph will be updated if housekeeping is
> -	required. Default is false. See linkgit:git-commit-graph[1]
> -	for details.
> +	required. Default is false, unless `feature.manyCommits`
> +	is enabled. See linkgit:git-commit-graph[1] for details.
>  
>  gc.logExpiry::
>  	If the file gc.log exists, then `git gc --auto` will print
> diff --git a/repo-settings.c b/repo-settings.c
> index 309577f6bc..fc05f4fbc4 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -2,6 +2,8 @@
>  #include "config.h"
>  #include "repository.h"
>  
> +#define UPDATE_DEFAULT(s,v) do { if (s == -1) { s = v; } } while(0)

A few points:

 1. give 's' and 'v' a bit better name, perhaps 'slot' and 'value'?

 2. "do { if ((s) == -1) { (s) = (v); } } while(0)"

 3. When we learn to set default values for variables that are not
    boolean in the future, we will regret that we did not name it
    UPDATE_DEFAULT_BOOL(slot, value).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Tue, 30 Jul 2019, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +#define UPDATE_DEFAULT(s,v) do { if (s =3D=3D -1) { s =3D v; } } whil=
e(0)
>
> [...]
>  3. When we learn to set default values for variables that are not
>     boolean in the future, we will regret that we did not name it
>     UPDATE_DEFAULT_BOOL(slot, value).

On the other hand, as we never promised any kind of API (and this is not
even an internal API to begin with), it will be _easy_ to rename it in
the unlikely event that we would ever introduce non-boolean defaults to
override, wouldn't you agree?

We have plenty of precedent where patch series start by refactoring,
whether it is to rename functions or variables or files or extracting
functions. Preparing for a future that might never come strikes me as
premature optimization.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Tue, 30 Jul 2019, Junio C Hamano wrote:
>
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > +#define UPDATE_DEFAULT(s,v) do { if (s == -1) { s = v; } } while(0)
>>
>> [...]
>>  3. When we learn to set default values for variables that are not
>>     boolean in the future, we will regret that we did not name it
>>     UPDATE_DEFAULT_BOOL(slot, value).
>
> On the other hand, as we never promised any kind of API (and this is not
> even an internal API to begin with), it will be _easy_ to rename it in
> the unlikely event that we would ever introduce non-boolean defaults to
> override, wouldn't you agree?

I agree that it is easy to say that it is easy to rename it later
and burden somebody else with the task.

I know that the renaming itself is easy, when you limit yourself
within the scope of a single topic, whether done now or later.  I
also know that having to worry about other topics in flight has
non-zero cost.  I also know that you are not the one who will bear
it---I will be.

So from my point of view, if we can make a prediction, even with
limited knowledge that a name may need to be renamed in the future,
it is better not pick such a name and instead use one that we think
it has a better chance of surviving without needing a rename.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 30, 2019

This branch is now known as ds/feature-macros.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 30, 2019

This patch series was integrated into pu via git@51d5215.

@gitgitgadget gitgitgadget bot added the pu label Jul 30, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 30, 2019

This patch series was integrated into pu via git@c1dfead.

@@ -345,6 +345,8 @@ include::config/difftool.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Jul 30 2019, Derrick Stolee via GitGitGadget wrote:

> +feature.*::
> +	The config settings that start with `feature.` modify the defaults of
> +	a group of other config settings. These groups are created by the Git
> +	developer community as recommended defaults and are subject to change.
> +	In particular, new config options may be added with different defaults.
> +
> +feature.manyCommits::
> +	Enable config options that optimize for repos with many commits. This
> +	setting is recommended for repos with at least 100,000 commits. The
> +	new default values are:
> ++
> +* `core.commitGraph=true` enables reading the commit-graph file.
> ++
> +* `gc.writeCommitGraph=true` enables writing the commit-graph file during
> +garbage collection.

During the whole new commit graph format discussion (which has now
landed) we discussed just auto toggling this:
https://public-inbox.org/git/87zhobr4fl.fsf@evledraar.gmail.com/

This looks fine, but have we backed out of simply enabling this at this
point? I don't see why not, regardless of commit count...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/31/2019 11:01 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jul 30 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> +feature.*::
>> +	The config settings that start with `feature.` modify the defaults of
>> +	a group of other config settings. These groups are created by the Git
>> +	developer community as recommended defaults and are subject to change.
>> +	In particular, new config options may be added with different defaults.
>> +
>> +feature.manyCommits::
>> +	Enable config options that optimize for repos with many commits. This
>> +	setting is recommended for repos with at least 100,000 commits. The
>> +	new default values are:
>> ++
>> +* `core.commitGraph=true` enables reading the commit-graph file.
>> ++
>> +* `gc.writeCommitGraph=true` enables writing the commit-graph file during
>> +garbage collection.
> 
> During the whole new commit graph format discussion (which has now
> landed) we discussed just auto toggling this:
> https://public-inbox.org/git/87zhobr4fl.fsf@evledraar.gmail.com/
> 
> This looks fine, but have we backed out of simply enabling this at this
> point? I don't see why not, regardless of commit count...

I would be happy to drop feature.manyCommits and instead swap the
defaults of core.commitGraph and gc.writeCommitGraph to true if we think
that is what we want to do for 2.24.0. We can use the repo settings and
UPDATE_DEFAULT[_BOOL] for this purpose.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 31, 2019

This patch series was integrated into pu via git@159b555.

The commit-graph feature has seen a lot of activity in the past
year or so since it was introduced. The feature is a critical
performance enhancement for medium- to large-sized repos, and
does not significantly hurt small repos.

Change the defaults for core.commitGraph and gc.writeCommitGraph
to true so users benefit from this feature by default.

There are several places in the test suite where the environment
variable GIT_TEST_COMMIT_GRAPH is disabled to avoid reading a
commit-graph, if it exists. The config option overrides the
environment, so swap these. Some GIT_TEST_COMMIT_GRAPH assignments
remain, and those are to avoid writing a commit-graph when a new
commit is created.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The core.untrackedCache config setting is slightly complicated,
so clarify its use and centralize its parsing into the repo
settings.

The default value is "keep" (returned as -1), which persists the
untracked cache if it exists.

If the value is set as "false" (returned as 0), then remove the
untracked cache if it exists.

If the value is set as "true" (returned as 1), then write the
untracked cache and persist it.

Instead of relying on magic values of -1, 0, and 1, split these
options into an enum. This allows the use of "-1" as a
default value. After parsing the config options, if the value is
unset we can initialize it to UNTRACKED_CACHE_KEEP.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The feature.manyFiles setting is suitable for repos with many
files in the working directory. By setting index.version=4 and
core.untrackedCache=true, commands such as 'git status' should
improve.

While adding this setting, modify the index version precedence
tests to check how this setting overrides the default for
index.version is unset.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The 'feature.experimental' setting includes config options that are
not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to
use the repo_settings struct if not already using them:

* 'pack.useSparse=true'

* 'fetch.negotiationAlgorithm=skipping'

In the case of fetch.negotiationAlgorithm, the existing logic
would load the config option only when about to use the setting,
so had a die() statement on an unknown string value. This is
removed as now the config is parsed under prepare_repo_settings().
In general, this die() is probably misplaced and not valuable.
A test was removed that checked this die() statement executed.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2019

Submitted as pull.292.v4.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * Update docs and 0 constant in feature.experimental around
>    merge.directoryRenames.

I somehow thought that the concensus was that *.experimental is for
things that should become the future default, and this one is not
something we would want to make the default.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  * Update docs and 0 constant in feature.experimental around
>>    merge.directoryRenames.
>
> I somehow thought that the concensus was that *.experimental is for
> things that should become the future default, and this one is not
> something we would want to make the default.

Ah, I see merge.directoryRenames is gone.  Sorry for a confusion.



@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2019

This patch series was integrated into pu via git@99c689d.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/13/2019 5:08 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>  * Update docs and 0 constant in feature.experimental around
>>>    merge.directoryRenames.
>>
>> I somehow thought that the concensus was that *.experimental is for
>> things that should become the future default, and this one is not
>> something we would want to make the default.
> 
> Ah, I see merge.directoryRenames is gone.  Sorry for a confusion.

Confusion is my fault. This was a terrible way to say, "I cleaned up the
adjustments to merge.directoryRenames AND removed it from features.experimental"

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/14/2019 6:32 AM, Derrick Stolee wrote:
> On 8/13/2019 5:08 PM, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>  * Update docs and 0 constant in feature.experimental around
>>>>    merge.directoryRenames.
>>>
>>> I somehow thought that the concensus was that *.experimental is for
>>> things that should become the future default, and this one is not
>>> something we would want to make the default.
>>
>> Ah, I see merge.directoryRenames is gone.  Sorry for a confusion.
> 
> Confusion is my fault. This was a terrible way to say, "I cleaned up the
> adjustments to merge.directoryRenames AND removed it from features.experimental"

It's too early for me to say things... let's try this again.

1. merge.directoryRenames is removed from this series.

2. The commit where I did the cleanup according to Elijahs' recommendation is 
   available as a tag, as mentioned in [1].

[1] https://public-inbox.org/git/1a2bf8e1-5cc1-c626-9dfc-dd6dc58fa0ad@gmail.com/

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2019

This patch series was integrated into pu via git@d4c4d06.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2019

This patch series was integrated into pu via git@adf1429.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 16, 2019

This patch series was integrated into pu via git@1080cce.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 19, 2019

This patch series was integrated into pu via git@e9f1ed3.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2019

This patch series was integrated into pu via git@92c72e8.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 21, 2019

This patch series was integrated into pu via git@e82ff1f.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

This patch series was integrated into pu via git@4ae1d01.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

This patch series was integrated into pu via git@8f776be.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2019

This patch series was integrated into pu via git@25dc9d3.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2019

This patch series was integrated into next via git@115f0c8.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 26, 2019

This patch series was integrated into pu via git@9bfb6ef.

@@ -577,7 +577,7 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/repo-settings.c b/repo-settings.c
> index 309577f6bc..d00b675687 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -2,6 +2,8 @@
>  #include "config.h"
>  #include "repository.h"
>  
> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
> +
>  void prepare_repo_settings(struct repository *r)
>  {
>  	int value;
> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>  		r->settings.core_commit_graph = value;
>  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>  		r->settings.gc_write_commit_graph = value;
> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);


This is a "review comment" that is more than 2 years late X-<, but I
noticed that this is used to muck with a structure that was
initialized by filling it with \0377 bytes.

+           /* Defaults */
+           memset(&r->settings, -1, sizeof(r->settings));

but the structure is is full of "int" and "enum", so apparently this
works only on 2's complement architecture.

        struct repo_settings {
                int initialized;

                int core_commit_graph;
                int commit_graph_read_changed_paths;
                int gc_write_commit_graph;
                int fetch_write_commit_graph;

                int index_version;
                enum untracked_cache_setting core_untracked_cache;

                int pack_use_sparse;
                enum fetch_negotiation_setting fetch_negotiation_algorithm;

                int core_multi_pack_index;

                unsigned command_requires_full_index:1,
                         sparse_index:1;
        };

I see that the earliest iteration of this series [*1*] set the
default explicitly using assignments of the correct types, like
this:


+void prepare_repo_settings(struct repository *r)
+{
+       if (r->settings)
+          return;
+
+       r->settings = xmalloc(sizeof(*r->settings));
+
+       /* Defaults */
+       r->settings->core_commit_graph = -1;
+       r->settings->gc_write_commit_graph = -1;
+       r->settings->pack_use_sparse = -1;
+       r->settings->index_version = -1;
+ ...

which I think should be a reasonable starting point to fix the
current code.

Another thing I noticed is that while it may have been only for
setting the default value for a boolean variable initially, other
changes abuse the macro to set an arbitrary integer values to
integer members of the structure, e.g. c6cc4c5a (repo-settings:
create feature.manyFiles setting, 2019-08-13) sets 4 to the
index_version (naturally, the choice between 0 and 1 does not make
much sense for the member), and ad0fb659 (repo-settings: parse
core.untrackedCache, 2019-08-13) stuffs UNTRACKED_CACHE_* enum to
core_untracked_cache.  The UPDATE_DEFAULT_BOOL() macro should be
renamed to UPDATE_DEFAULT_INT() at least, I would think, to save
readers from confusion.

Thanks.


[Reference]

*1* https://lore.kernel.org/git/72f652b89c71526cc423e7812de66f41a079f181.1563818059.git.gitgitgadget@gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "brian m. carlson" wrote (reply to this):


--5s7j4Fz6688k7IB7
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>=20
> > diff --git a/repo-settings.c b/repo-settings.c
> > index 309577f6bc..d00b675687 100644
> > --- a/repo-settings.c
> > +++ b/repo-settings.c
> > @@ -2,6 +2,8 @@
> >  #include "config.h"
> >  #include "repository.h"
> > =20
> > +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s =3D=3D -1) { s =3D v; } } =
while(0)
> > +
> >  void prepare_repo_settings(struct repository *r)
> >  {
> >  	int value;
> > @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
> >  		r->settings.core_commit_graph =3D value;
> >  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
> >  		r->settings.gc_write_commit_graph =3D value;
> > +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
> > +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>=20
>=20
> This is a "review comment" that is more than 2 years late X-<, but I
> noticed that this is used to muck with a structure that was
> initialized by filling it with \0377 bytes.
>=20
> +           /* Defaults */
> +           memset(&r->settings, -1, sizeof(r->settings));
>=20
> but the structure is is full of "int" and "enum", so apparently this
> works only on 2's complement architecture.

This statement is true, but are there systems capable of running Git
which don't use two's complement?  Rust requires two's complement signed
integers, and there's a proposal[0] to the C++ working group to only
support two's complement because "[t]o the author=E2=80=99s knowledge no mo=
dern
machine uses both C++ and a signed integer representation other than
two=E2=80=99s complement".  That proposal goes on to note that none of MSVC,
GCC, or LLVM support other options.

Personally I am not aware of any modern processor which provides signed
integer types using other than two's complement.

[0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
--=20
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

--5s7j4Fz6688k7IB7
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.3.1 (GNU/Linux)

iHUEABYKAB0WIQQILOaKnbxl+4PRw5F8DEliiIeigQUCYUfjIAAKCRB8DEliiIei
gYVvAP9j8xao1G9SV14eHk8QWj8yHhZIXo7aLn2ZvbegtYgxSgD+Ibmo1ud/ywED
G7idr/EMPu1ji0NudbVHkrlVPdYd4AA=
=NrBC
-----END PGP SIGNATURE-----

--5s7j4Fz6688k7IB7--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sun, Sep 19 2021, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/repo-settings.c b/repo-settings.c
>> index 309577f6bc..d00b675687 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -2,6 +2,8 @@
>>  #include "config.h"
>>  #include "repository.h"
>>  
>> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
>> +
>>  void prepare_repo_settings(struct repository *r)
>>  {
>>  	int value;
>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>  		r->settings.core_commit_graph = value;
>>  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>  		r->settings.gc_write_commit_graph = value;
>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>
>
> This is a "review comment" that is more than 2 years late X-<, but I
> noticed that this is used to muck with a structure that was
> initialized by filling it with \0377 bytes.
>
> +           /* Defaults */
> +           memset(&r->settings, -1, sizeof(r->settings));
>
> but the structure is is full of "int" and "enum", so apparently this
> works only on 2's complement architecture.
>
>         struct repo_settings {
>                 int initialized;
>
>                 int core_commit_graph;
>                 int commit_graph_read_changed_paths;
>                 int gc_write_commit_graph;
>                 int fetch_write_commit_graph;
>
>                 int index_version;
>                 enum untracked_cache_setting core_untracked_cache;
>
>                 int pack_use_sparse;
>                 enum fetch_negotiation_setting fetch_negotiation_algorithm;
>
>                 int core_multi_pack_index;
>
>                 unsigned command_requires_full_index:1,
>                          sparse_index:1;
>         };
>
> I see that the earliest iteration of this series [*1*] set the
> default explicitly using assignments of the correct types, like
> this:
>
>
> +void prepare_repo_settings(struct repository *r)
> +{
> +       if (r->settings)
> +          return;
> +
> +       r->settings = xmalloc(sizeof(*r->settings));
> +
> +       /* Defaults */
> +       r->settings->core_commit_graph = -1;
> +       r->settings->gc_write_commit_graph = -1;
> +       r->settings->pack_use_sparse = -1;
> +       r->settings->index_version = -1;
> + ...
>
> which I think should be a reasonable starting point to fix the
> current code.
>
> Another thing I noticed is that while it may have been only for
> setting the default value for a boolean variable initially, other
> changes abuse the macro to set an arbitrary integer values to
> integer members of the structure, e.g. c6cc4c5a (repo-settings:
> create feature.manyFiles setting, 2019-08-13) sets 4 to the
> index_version (naturally, the choice between 0 and 1 does not make
> much sense for the member), and ad0fb659 (repo-settings: parse
> core.untrackedCache, 2019-08-13) stuffs UNTRACKED_CACHE_* enum to
> core_untracked_cache.  The UPDATE_DEFAULT_BOOL() macro should be
> renamed to UPDATE_DEFAULT_INT() at least, I would think, to save
> readers from confusion.

Yes this is all a bit weird and/or broken, but I'm a bit perplexed at
this reply to a 2+ year old E-Mail given my outstanding series to fix
all these issues you've noted here[1] posted in the last few days, and
you having read (at least part of) it[1].

But then again, the last patch you left a comment on was 3/5. It's 4/5
that fixes all the issues you note above[2] :)

The macro is gone, so is the memset to -1 and other weird emergent
behavior. We can rely on repo_init() having zero'd the structure for us,
and we just proceed to set sensible defaults in a way that doesn't stomp
over the types in the struct.

1. https://lore.kernel.org/git/cover-v3-0.5-00000000000-20210919T084703Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/

> [Reference]
>
> *1* https://lore.kernel.org/git/72f652b89c71526cc423e7812de66f41a079f181.1563818059.git.gitgitgadget@gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 20/09/2021 02:25, brian m. carlson wrote:
> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> diff --git a/repo-settings.c b/repo-settings.c
>>> index 309577f6bc..d00b675687 100644
>>> --- a/repo-settings.c
>>> +++ b/repo-settings.c
>>> @@ -2,6 +2,8 @@
>>>   #include "config.h"
>>>   #include "repository.h"
>>>   
>>> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
>>> +
>>>   void prepare_repo_settings(struct repository *r)
>>>   {
>>>   	int value;
>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>   		r->settings.core_commit_graph = value;
>>>   	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>   		r->settings.gc_write_commit_graph = value;
>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>
>>
>> This is a "review comment" that is more than 2 years late X-<, but I
>> noticed that this is used to muck with a structure that was
>> initialized by filling it with \0377 bytes.
>>
>> +           /* Defaults */
>> +           memset(&r->settings, -1, sizeof(r->settings));
>>
>> but the structure is is full of "int" and "enum", so apparently this
>> works only on 2's complement architecture.
> 
> This statement is true, but are there systems capable of running Git
> which don't use two's complement?  Rust requires two's complement signed
> integers, and there's a proposal[0] to the C++ working group to only
> support two's complement because "[t]o the author’s knowledge no modern
> machine uses both C++ and a signed integer representation other than
> two’s complement".  That proposal goes on to note that none of MSVC,
> GCC, or LLVM support other options.

A similar proposal [1] is included in the draft of the next C standard 
[2]. As integer representation is implementation defined I believe this 
code has well defined behavior on 2's complement implementations. If an 
enum has no negative members then the compiler may choose an unsigned 
representation but even then the comparison to -1 is well defined. In 
this case I'm pretty sure the enums all have -1 as a member so are 
signed. Using memset() to initialize the struct eases future maintenance 
when members are added or removed and seems to me to be a sensible 
design choice.

Best Wishes

Phillip

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf

> Personally I am not aware of any modern processor which provides signed
> integer types using other than two's complement.
> 
> [0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Sep 20 2021, Phillip Wood wrote:

> On 20/09/2021 02:25, brian m. carlson wrote:
>> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> diff --git a/repo-settings.c b/repo-settings.c
>>>> index 309577f6bc..d00b675687 100644
>>>> --- a/repo-settings.c
>>>> +++ b/repo-settings.c
>>>> @@ -2,6 +2,8 @@
>>>>   #include "config.h"
>>>>   #include "repository.h"
>>>>   +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } }
>>>> while(0)
>>>> +
>>>>   void prepare_repo_settings(struct repository *r)
>>>>   {
>>>>   	int value;
>>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>>   		r->settings.core_commit_graph = value;
>>>>   	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>>   		r->settings.gc_write_commit_graph = value;
>>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>>
>>>
>>> This is a "review comment" that is more than 2 years late X-<, but I
>>> noticed that this is used to muck with a structure that was
>>> initialized by filling it with \0377 bytes.
>>>
>>> +           /* Defaults */
>>> +           memset(&r->settings, -1, sizeof(r->settings));
>>>
>>> but the structure is is full of "int" and "enum", so apparently this
>>> works only on 2's complement architecture.
>> This statement is true, but are there systems capable of running Git
>> which don't use two's complement?  Rust requires two's complement signed
>> integers, and there's a proposal[0] to the C++ working group to only
>> support two's complement because "[t]o the author’s knowledge no modern
>> machine uses both C++ and a signed integer representation other than
>> two’s complement".  That proposal goes on to note that none of MSVC,
>> GCC, or LLVM support other options.
>
> A similar proposal [1] is included in the draft of the next C standard
> [2]. As integer representation is implementation defined I believe
> this code has well defined behavior on 2's complement
> implementations. If an enum has no negative members then the compiler
> may choose an unsigned representation but even then the comparison to
> -1 is well defined.

That's informative, thanks.

> In this case I'm pretty sure the enums all have -1
> as a member so are signed. Using memset() to initialize the struct
> eases future maintenance when members are added or removed and seems
> to me to be a sensible design choice.

It's really not sensible at all in this particular case, as I think my
[1] which gets rid of the pattern convincingly argues.

I.e. the only reason it had a memset() of -1 after we'd already memset
it to 0 was because the function was tripping over itself and setting
defaults in the wrong order for no good reason.

I.e. it was doing things like (pseudocode);

    memset(&data, -1, ...)
    if_config_is_true_set("x.y", data.x_y);
    if (data.x_y == -1)
        data.x_y = x_y_default();

When we can instead just do:

    data.x_y = x_y_default();
    set_if_cfg_key_exists("x.y", &data.x_y);

Which is how we e.g. handle options parsing, we have hardcoded defaults,
then read defaults from config, then set options, in that order.

We don't set options, then check if each value is still -1, if so read
config etc. Just read them in priority order, doing it any other way is
just make-work for something that's the equivalent of a simple
short-circuit || operation.

Anyway, there are other cases where we need to read something and
distinguish e.g. false/true from "unset", and there a -1,0,1 tri-state
serves us well.

But even in those cases what repo-settings.c was doing of memsetting the
entire struct to -1 (2's compliment aside) just makes for needlessly
hard to read code.

If we've got some members that need -1 defaults we should instead have
that in an *_INIT macro or equivalent. The pre-[1] repo-settings.c also
has code like this pseudocode:

    data.a_b = -1; /* default for a bi-state, not tri-state variable */
    set_if_cfg_key_exists("a.b", &data.a_b);
    if (data.a_b == -1)
        data.a_b = 1; /* on by default */

Which, urm, you can just do as:

    data.a_b = 1; /* on by default */
    set_if_cfg_key_exists("a.b", &data.a_b);

I.e. the setup for things that never wanted or cared about being set to
-1 was complicated by them needing to un-set themselves from a -1
default they never wanted.

Thus the anti-pattern, yes set defaults for some members to -1, but not
the entire struct. The only value we should memset a whole bag-of-stuff
config struct to is 0, as that's the only sensible default & plays well
with other C semantics.

1. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/

>
> Best Wishes
>
> Phillip
>
> [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
> [2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
>
>> Personally I am not aware of any modern processor which provides signed
>> integer types using other than two's complement.
>> [0]
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
>> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Sep 20, 2021 at 03:30:38PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Thus the anti-pattern, yes set defaults for some members to -1, but not
> the entire struct. The only value we should memset a whole bag-of-stuff
> config struct to is 0, as that's the only sensible default & plays well
> with other C semantics.

FWIW, I agree. I had to scratch my head for a moment at why a memset of
"-1" would work at all on multi-byte types. I think it's better avoided
in the name of readability and obviousness, not to mention the trap it
leaves for items that don't sensibly initialize with it (like, say,
pointers).

As an aside, memset to 0 is _also_ undefined for pointers, but we long
ago decided not to care, as no real-world systems have a problem with
this.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 20/09/2021 14:30, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Sep 20 2021, Phillip Wood wrote:
> 
>> On 20/09/2021 02:25, brian m. carlson wrote:
>>> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>>
>>>>> diff --git a/repo-settings.c b/repo-settings.c
>>>>> index 309577f6bc..d00b675687 100644
>>>>> --- a/repo-settings.c
>>>>> +++ b/repo-settings.c
>>>>> @@ -2,6 +2,8 @@
>>>>>    #include "config.h"
>>>>>    #include "repository.h"
>>>>>    +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } }
>>>>> while(0)
>>>>> +
>>>>>    void prepare_repo_settings(struct repository *r)
>>>>>    {
>>>>>    	int value;
>>>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>>>    		r->settings.core_commit_graph = value;
>>>>>    	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>>>    		r->settings.gc_write_commit_graph = value;
>>>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>>>
>>>>
>>>> This is a "review comment" that is more than 2 years late X-<, but I
>>>> noticed that this is used to muck with a structure that was
>>>> initialized by filling it with \0377 bytes.
>>>>
>>>> +           /* Defaults */
>>>> +           memset(&r->settings, -1, sizeof(r->settings));
>>>>
>>>> but the structure is is full of "int" and "enum", so apparently this
>>>> works only on 2's complement architecture.
>>> This statement is true, but are there systems capable of running Git
>>> which don't use two's complement?  Rust requires two's complement signed
>>> integers, and there's a proposal[0] to the C++ working group to only
>>> support two's complement because "[t]o the author’s knowledge no modern
>>> machine uses both C++ and a signed integer representation other than
>>> two’s complement".  That proposal goes on to note that none of MSVC,
>>> GCC, or LLVM support other options.
>>
>> A similar proposal [1] is included in the draft of the next C standard
>> [2]. As integer representation is implementation defined I believe
>> this code has well defined behavior on 2's complement
>> implementations. If an enum has no negative members then the compiler
>> may choose an unsigned representation but even then the comparison to
>> -1 is well defined.
> 
> That's informative, thanks.
> 
>> In this case I'm pretty sure the enums all have -1
>> as a member so are signed. Using memset() to initialize the struct
>> eases future maintenance when members are added or removed and seems
>> to me to be a sensible design choice.
> 
> It's really not sensible at all in this particular case, as I think my
> [1] which gets rid of the pattern convincingly argues.

I meant that it was a sensible way to initialize all the struct members 
to -1, I did not mean to comment either way on whether such an 
initialization was sensible. If we can just store the default and then 
update from the user's config that sounds sensible.

Best Wishes

Phillip

> I.e. the only reason it had a memset() of -1 after we'd already memset
> it to 0 was because the function was tripping over itself and setting
> defaults in the wrong order for no good reason.
> 
> I.e. it was doing things like (pseudocode);
> 
>      memset(&data, -1, ...)
>      if_config_is_true_set("x.y", data.x_y);
>      if (data.x_y == -1)
>          data.x_y = x_y_default();
> 
> When we can instead just do:
> 
>      data.x_y = x_y_default();
>      set_if_cfg_key_exists("x.y", &data.x_y);
> 
> Which is how we e.g. handle options parsing, we have hardcoded defaults,
> then read defaults from config, then set options, in that order.
> 
> We don't set options, then check if each value is still -1, if so read
> config etc. Just read them in priority order, doing it any other way is
> just make-work for something that's the equivalent of a simple
> short-circuit || operation.
> 
> Anyway, there are other cases where we need to read something and
> distinguish e.g. false/true from "unset", and there a -1,0,1 tri-state
> serves us well.
> 
> But even in those cases what repo-settings.c was doing of memsetting the
> entire struct to -1 (2's compliment aside) just makes for needlessly
> hard to read code.
> 
> If we've got some members that need -1 defaults we should instead have
> that in an *_INIT macro or equivalent. The pre-[1] repo-settings.c also
> has code like this pseudocode:
> 
>      data.a_b = -1; /* default for a bi-state, not tri-state variable */
>      set_if_cfg_key_exists("a.b", &data.a_b);
>      if (data.a_b == -1)
>          data.a_b = 1; /* on by default */
> 
> Which, urm, you can just do as:
> 
>      data.a_b = 1; /* on by default */
>      set_if_cfg_key_exists("a.b", &data.a_b);
> 
> I.e. the setup for things that never wanted or cared about being set to
> -1 was complicated by them needing to un-set themselves from a -1
> default they never wanted.
> 
> Thus the anti-pattern, yes set defaults for some members to -1, but not
> the entire struct. The only value we should memset a whole bag-of-stuff
> config struct to is 0, as that's the only sensible default & plays well
> with other C semantics.
> 
> 1. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/
> 
>>
>> Best Wishes
>>
>> Phillip
>>
>> [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
>> [2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
>>
>>> Personally I am not aware of any modern processor which provides signed
>>> integer types using other than two's complement.
>>> [0]
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
>>>
> 

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 20, 2021

User "brian m. carlson" <sandals@crustytoothpaste.net> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 20, 2021

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants