Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 'feature.*' config area and some centralized config parsing #292

Closed
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Documentation/config/core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/repo-settings.c b/repo-settings.c
> index 309577f6bc..d00b675687 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -2,6 +2,8 @@
>  #include "config.h"
>  #include "repository.h"
>  
> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
> +
>  void prepare_repo_settings(struct repository *r)
>  {
>  	int value;
> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>  		r->settings.core_commit_graph = value;
>  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>  		r->settings.gc_write_commit_graph = value;
> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);


This is a "review comment" that is more than 2 years late X-<, but I
noticed that this is used to muck with a structure that was
initialized by filling it with \0377 bytes.

+           /* Defaults */
+           memset(&r->settings, -1, sizeof(r->settings));

but the structure is is full of "int" and "enum", so apparently this
works only on 2's complement architecture.

        struct repo_settings {
                int initialized;

                int core_commit_graph;
                int commit_graph_read_changed_paths;
                int gc_write_commit_graph;
                int fetch_write_commit_graph;

                int index_version;
                enum untracked_cache_setting core_untracked_cache;

                int pack_use_sparse;
                enum fetch_negotiation_setting fetch_negotiation_algorithm;

                int core_multi_pack_index;

                unsigned command_requires_full_index:1,
                         sparse_index:1;
        };

I see that the earliest iteration of this series [*1*] set the
default explicitly using assignments of the correct types, like
this:


+void prepare_repo_settings(struct repository *r)
+{
+       if (r->settings)
+          return;
+
+       r->settings = xmalloc(sizeof(*r->settings));
+
+       /* Defaults */
+       r->settings->core_commit_graph = -1;
+       r->settings->gc_write_commit_graph = -1;
+       r->settings->pack_use_sparse = -1;
+       r->settings->index_version = -1;
+ ...

which I think should be a reasonable starting point to fix the
current code.

Another thing I noticed is that while it may have been only for
setting the default value for a boolean variable initially, other
changes abuse the macro to set an arbitrary integer values to
integer members of the structure, e.g. c6cc4c5a (repo-settings:
create feature.manyFiles setting, 2019-08-13) sets 4 to the
index_version (naturally, the choice between 0 and 1 does not make
much sense for the member), and ad0fb659 (repo-settings: parse
core.untrackedCache, 2019-08-13) stuffs UNTRACKED_CACHE_* enum to
core_untracked_cache.  The UPDATE_DEFAULT_BOOL() macro should be
renamed to UPDATE_DEFAULT_INT() at least, I would think, to save
readers from confusion.

Thanks.


[Reference]

*1* https://lore.kernel.org/git/72f652b89c71526cc423e7812de66f41a079f181.1563818059.git.gitgitgadget@gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "brian m. carlson" wrote (reply to this):


--5s7j4Fz6688k7IB7
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>=20
> > diff --git a/repo-settings.c b/repo-settings.c
> > index 309577f6bc..d00b675687 100644
> > --- a/repo-settings.c
> > +++ b/repo-settings.c
> > @@ -2,6 +2,8 @@
> >  #include "config.h"
> >  #include "repository.h"
> > =20
> > +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s =3D=3D -1) { s =3D v; } } =
while(0)
> > +
> >  void prepare_repo_settings(struct repository *r)
> >  {
> >  	int value;
> > @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
> >  		r->settings.core_commit_graph =3D value;
> >  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
> >  		r->settings.gc_write_commit_graph =3D value;
> > +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
> > +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>=20
>=20
> This is a "review comment" that is more than 2 years late X-<, but I
> noticed that this is used to muck with a structure that was
> initialized by filling it with \0377 bytes.
>=20
> +           /* Defaults */
> +           memset(&r->settings, -1, sizeof(r->settings));
>=20
> but the structure is is full of "int" and "enum", so apparently this
> works only on 2's complement architecture.

This statement is true, but are there systems capable of running Git
which don't use two's complement?  Rust requires two's complement signed
integers, and there's a proposal[0] to the C++ working group to only
support two's complement because "[t]o the author=E2=80=99s knowledge no mo=
dern
machine uses both C++ and a signed integer representation other than
two=E2=80=99s complement".  That proposal goes on to note that none of MSVC,
GCC, or LLVM support other options.

Personally I am not aware of any modern processor which provides signed
integer types using other than two's complement.

[0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
--=20
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

--5s7j4Fz6688k7IB7
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.3.1 (GNU/Linux)

iHUEABYKAB0WIQQILOaKnbxl+4PRw5F8DEliiIeigQUCYUfjIAAKCRB8DEliiIei
gYVvAP9j8xao1G9SV14eHk8QWj8yHhZIXo7aLn2ZvbegtYgxSgD+Ibmo1ud/ywED
G7idr/EMPu1ji0NudbVHkrlVPdYd4AA=
=NrBC
-----END PGP SIGNATURE-----

--5s7j4Fz6688k7IB7--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sun, Sep 19 2021, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/repo-settings.c b/repo-settings.c
>> index 309577f6bc..d00b675687 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -2,6 +2,8 @@
>>  #include "config.h"
>>  #include "repository.h"
>>  
>> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
>> +
>>  void prepare_repo_settings(struct repository *r)
>>  {
>>  	int value;
>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>  		r->settings.core_commit_graph = value;
>>  	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>  		r->settings.gc_write_commit_graph = value;
>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>
>
> This is a "review comment" that is more than 2 years late X-<, but I
> noticed that this is used to muck with a structure that was
> initialized by filling it with \0377 bytes.
>
> +           /* Defaults */
> +           memset(&r->settings, -1, sizeof(r->settings));
>
> but the structure is is full of "int" and "enum", so apparently this
> works only on 2's complement architecture.
>
>         struct repo_settings {
>                 int initialized;
>
>                 int core_commit_graph;
>                 int commit_graph_read_changed_paths;
>                 int gc_write_commit_graph;
>                 int fetch_write_commit_graph;
>
>                 int index_version;
>                 enum untracked_cache_setting core_untracked_cache;
>
>                 int pack_use_sparse;
>                 enum fetch_negotiation_setting fetch_negotiation_algorithm;
>
>                 int core_multi_pack_index;
>
>                 unsigned command_requires_full_index:1,
>                          sparse_index:1;
>         };
>
> I see that the earliest iteration of this series [*1*] set the
> default explicitly using assignments of the correct types, like
> this:
>
>
> +void prepare_repo_settings(struct repository *r)
> +{
> +       if (r->settings)
> +          return;
> +
> +       r->settings = xmalloc(sizeof(*r->settings));
> +
> +       /* Defaults */
> +       r->settings->core_commit_graph = -1;
> +       r->settings->gc_write_commit_graph = -1;
> +       r->settings->pack_use_sparse = -1;
> +       r->settings->index_version = -1;
> + ...
>
> which I think should be a reasonable starting point to fix the
> current code.
>
> Another thing I noticed is that while it may have been only for
> setting the default value for a boolean variable initially, other
> changes abuse the macro to set an arbitrary integer values to
> integer members of the structure, e.g. c6cc4c5a (repo-settings:
> create feature.manyFiles setting, 2019-08-13) sets 4 to the
> index_version (naturally, the choice between 0 and 1 does not make
> much sense for the member), and ad0fb659 (repo-settings: parse
> core.untrackedCache, 2019-08-13) stuffs UNTRACKED_CACHE_* enum to
> core_untracked_cache.  The UPDATE_DEFAULT_BOOL() macro should be
> renamed to UPDATE_DEFAULT_INT() at least, I would think, to save
> readers from confusion.

Yes this is all a bit weird and/or broken, but I'm a bit perplexed at
this reply to a 2+ year old E-Mail given my outstanding series to fix
all these issues you've noted here[1] posted in the last few days, and
you having read (at least part of) it[1].

But then again, the last patch you left a comment on was 3/5. It's 4/5
that fixes all the issues you note above[2] :)

The macro is gone, so is the memset to -1 and other weird emergent
behavior. We can rely on repo_init() having zero'd the structure for us,
and we just proceed to set sensible defaults in a way that doesn't stomp
over the types in the struct.

1. https://lore.kernel.org/git/cover-v3-0.5-00000000000-20210919T084703Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/

> [Reference]
>
> *1* https://lore.kernel.org/git/72f652b89c71526cc423e7812de66f41a079f181.1563818059.git.gitgitgadget@gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 20/09/2021 02:25, brian m. carlson wrote:
> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> diff --git a/repo-settings.c b/repo-settings.c
>>> index 309577f6bc..d00b675687 100644
>>> --- a/repo-settings.c
>>> +++ b/repo-settings.c
>>> @@ -2,6 +2,8 @@
>>>   #include "config.h"
>>>   #include "repository.h"
>>>   
>>> +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)
>>> +
>>>   void prepare_repo_settings(struct repository *r)
>>>   {
>>>   	int value;
>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>   		r->settings.core_commit_graph = value;
>>>   	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>   		r->settings.gc_write_commit_graph = value;
>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>
>>
>> This is a "review comment" that is more than 2 years late X-<, but I
>> noticed that this is used to muck with a structure that was
>> initialized by filling it with \0377 bytes.
>>
>> +           /* Defaults */
>> +           memset(&r->settings, -1, sizeof(r->settings));
>>
>> but the structure is is full of "int" and "enum", so apparently this
>> works only on 2's complement architecture.
> 
> This statement is true, but are there systems capable of running Git
> which don't use two's complement?  Rust requires two's complement signed
> integers, and there's a proposal[0] to the C++ working group to only
> support two's complement because "[t]o the author’s knowledge no modern
> machine uses both C++ and a signed integer representation other than
> two’s complement".  That proposal goes on to note that none of MSVC,
> GCC, or LLVM support other options.

A similar proposal [1] is included in the draft of the next C standard 
[2]. As integer representation is implementation defined I believe this 
code has well defined behavior on 2's complement implementations. If an 
enum has no negative members then the compiler may choose an unsigned 
representation but even then the comparison to -1 is well defined. In 
this case I'm pretty sure the enums all have -1 as a member so are 
signed. Using memset() to initialize the struct eases future maintenance 
when members are added or removed and seems to me to be a sensible 
design choice.

Best Wishes

Phillip

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf

> Personally I am not aware of any modern processor which provides signed
> integer types using other than two's complement.
> 
> [0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Sep 20 2021, Phillip Wood wrote:

> On 20/09/2021 02:25, brian m. carlson wrote:
>> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> diff --git a/repo-settings.c b/repo-settings.c
>>>> index 309577f6bc..d00b675687 100644
>>>> --- a/repo-settings.c
>>>> +++ b/repo-settings.c
>>>> @@ -2,6 +2,8 @@
>>>>   #include "config.h"
>>>>   #include "repository.h"
>>>>   +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } }
>>>> while(0)
>>>> +
>>>>   void prepare_repo_settings(struct repository *r)
>>>>   {
>>>>   	int value;
>>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>>   		r->settings.core_commit_graph = value;
>>>>   	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>>   		r->settings.gc_write_commit_graph = value;
>>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>>
>>>
>>> This is a "review comment" that is more than 2 years late X-<, but I
>>> noticed that this is used to muck with a structure that was
>>> initialized by filling it with \0377 bytes.
>>>
>>> +           /* Defaults */
>>> +           memset(&r->settings, -1, sizeof(r->settings));
>>>
>>> but the structure is is full of "int" and "enum", so apparently this
>>> works only on 2's complement architecture.
>> This statement is true, but are there systems capable of running Git
>> which don't use two's complement?  Rust requires two's complement signed
>> integers, and there's a proposal[0] to the C++ working group to only
>> support two's complement because "[t]o the author’s knowledge no modern
>> machine uses both C++ and a signed integer representation other than
>> two’s complement".  That proposal goes on to note that none of MSVC,
>> GCC, or LLVM support other options.
>
> A similar proposal [1] is included in the draft of the next C standard
> [2]. As integer representation is implementation defined I believe
> this code has well defined behavior on 2's complement
> implementations. If an enum has no negative members then the compiler
> may choose an unsigned representation but even then the comparison to
> -1 is well defined.

That's informative, thanks.

> In this case I'm pretty sure the enums all have -1
> as a member so are signed. Using memset() to initialize the struct
> eases future maintenance when members are added or removed and seems
> to me to be a sensible design choice.

It's really not sensible at all in this particular case, as I think my
[1] which gets rid of the pattern convincingly argues.

I.e. the only reason it had a memset() of -1 after we'd already memset
it to 0 was because the function was tripping over itself and setting
defaults in the wrong order for no good reason.

I.e. it was doing things like (pseudocode);

    memset(&data, -1, ...)
    if_config_is_true_set("x.y", data.x_y);
    if (data.x_y == -1)
        data.x_y = x_y_default();

When we can instead just do:

    data.x_y = x_y_default();
    set_if_cfg_key_exists("x.y", &data.x_y);

Which is how we e.g. handle options parsing, we have hardcoded defaults,
then read defaults from config, then set options, in that order.

We don't set options, then check if each value is still -1, if so read
config etc. Just read them in priority order, doing it any other way is
just make-work for something that's the equivalent of a simple
short-circuit || operation.

Anyway, there are other cases where we need to read something and
distinguish e.g. false/true from "unset", and there a -1,0,1 tri-state
serves us well.

But even in those cases what repo-settings.c was doing of memsetting the
entire struct to -1 (2's compliment aside) just makes for needlessly
hard to read code.

If we've got some members that need -1 defaults we should instead have
that in an *_INIT macro or equivalent. The pre-[1] repo-settings.c also
has code like this pseudocode:

    data.a_b = -1; /* default for a bi-state, not tri-state variable */
    set_if_cfg_key_exists("a.b", &data.a_b);
    if (data.a_b == -1)
        data.a_b = 1; /* on by default */

Which, urm, you can just do as:

    data.a_b = 1; /* on by default */
    set_if_cfg_key_exists("a.b", &data.a_b);

I.e. the setup for things that never wanted or cared about being set to
-1 was complicated by them needing to un-set themselves from a -1
default they never wanted.

Thus the anti-pattern, yes set defaults for some members to -1, but not
the entire struct. The only value we should memset a whole bag-of-stuff
config struct to is 0, as that's the only sensible default & plays well
with other C semantics.

1. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/

>
> Best Wishes
>
> Phillip
>
> [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
> [2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
>
>> Personally I am not aware of any modern processor which provides signed
>> integer types using other than two's complement.
>> [0]
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
>> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Sep 20, 2021 at 03:30:38PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Thus the anti-pattern, yes set defaults for some members to -1, but not
> the entire struct. The only value we should memset a whole bag-of-stuff
> config struct to is 0, as that's the only sensible default & plays well
> with other C semantics.

FWIW, I agree. I had to scratch my head for a moment at why a memset of
"-1" would work at all on multi-byte types. I think it's better avoided
in the name of readability and obviousness, not to mention the trap it
leaves for items that don't sensibly initialize with it (like, say,
pointers).

As an aside, memset to 0 is _also_ undefined for pointers, but we long
ago decided not to care, as no real-world systems have a problem with
this.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 20/09/2021 14:30, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Sep 20 2021, Phillip Wood wrote:
> 
>> On 20/09/2021 02:25, brian m. carlson wrote:
>>> On 2021-09-20 at 00:42:57, Junio C Hamano wrote:
>>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>>
>>>>> diff --git a/repo-settings.c b/repo-settings.c
>>>>> index 309577f6bc..d00b675687 100644
>>>>> --- a/repo-settings.c
>>>>> +++ b/repo-settings.c
>>>>> @@ -2,6 +2,8 @@
>>>>>    #include "config.h"
>>>>>    #include "repository.h"
>>>>>    +#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } }
>>>>> while(0)
>>>>> +
>>>>>    void prepare_repo_settings(struct repository *r)
>>>>>    {
>>>>>    	int value;
>>>>> @@ -16,6 +18,8 @@ void prepare_repo_settings(struct repository *r)
>>>>>    		r->settings.core_commit_graph = value;
>>>>>    	if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
>>>>>    		r->settings.gc_write_commit_graph = value;
>>>>> +	UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
>>>>> +	UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);
>>>>
>>>>
>>>> This is a "review comment" that is more than 2 years late X-<, but I
>>>> noticed that this is used to muck with a structure that was
>>>> initialized by filling it with \0377 bytes.
>>>>
>>>> +           /* Defaults */
>>>> +           memset(&r->settings, -1, sizeof(r->settings));
>>>>
>>>> but the structure is is full of "int" and "enum", so apparently this
>>>> works only on 2's complement architecture.
>>> This statement is true, but are there systems capable of running Git
>>> which don't use two's complement?  Rust requires two's complement signed
>>> integers, and there's a proposal[0] to the C++ working group to only
>>> support two's complement because "[t]o the author’s knowledge no modern
>>> machine uses both C++ and a signed integer representation other than
>>> two’s complement".  That proposal goes on to note that none of MSVC,
>>> GCC, or LLVM support other options.
>>
>> A similar proposal [1] is included in the draft of the next C standard
>> [2]. As integer representation is implementation defined I believe
>> this code has well defined behavior on 2's complement
>> implementations. If an enum has no negative members then the compiler
>> may choose an unsigned representation but even then the comparison to
>> -1 is well defined.
> 
> That's informative, thanks.
> 
>> In this case I'm pretty sure the enums all have -1
>> as a member so are signed. Using memset() to initialize the struct
>> eases future maintenance when members are added or removed and seems
>> to me to be a sensible design choice.
> 
> It's really not sensible at all in this particular case, as I think my
> [1] which gets rid of the pattern convincingly argues.

I meant that it was a sensible way to initialize all the struct members 
to -1, I did not mean to comment either way on whether such an 
initialization was sensible. If we can just store the default and then 
update from the user's config that sounds sensible.

Best Wishes

Phillip

> I.e. the only reason it had a memset() of -1 after we'd already memset
> it to 0 was because the function was tripping over itself and setting
> defaults in the wrong order for no good reason.
> 
> I.e. it was doing things like (pseudocode);
> 
>      memset(&data, -1, ...)
>      if_config_is_true_set("x.y", data.x_y);
>      if (data.x_y == -1)
>          data.x_y = x_y_default();
> 
> When we can instead just do:
> 
>      data.x_y = x_y_default();
>      set_if_cfg_key_exists("x.y", &data.x_y);
> 
> Which is how we e.g. handle options parsing, we have hardcoded defaults,
> then read defaults from config, then set options, in that order.
> 
> We don't set options, then check if each value is still -1, if so read
> config etc. Just read them in priority order, doing it any other way is
> just make-work for something that's the equivalent of a simple
> short-circuit || operation.
> 
> Anyway, there are other cases where we need to read something and
> distinguish e.g. false/true from "unset", and there a -1,0,1 tri-state
> serves us well.
> 
> But even in those cases what repo-settings.c was doing of memsetting the
> entire struct to -1 (2's compliment aside) just makes for needlessly
> hard to read code.
> 
> If we've got some members that need -1 defaults we should instead have
> that in an *_INIT macro or equivalent. The pre-[1] repo-settings.c also
> has code like this pseudocode:
> 
>      data.a_b = -1; /* default for a bi-state, not tri-state variable */
>      set_if_cfg_key_exists("a.b", &data.a_b);
>      if (data.a_b == -1)
>          data.a_b = 1; /* on by default */
> 
> Which, urm, you can just do as:
> 
>      data.a_b = 1; /* on by default */
>      set_if_cfg_key_exists("a.b", &data.a_b);
> 
> I.e. the setup for things that never wanted or cared about being set to
> -1 was complicated by them needing to un-set themselves from a -1
> default they never wanted.
> 
> Thus the anti-pattern, yes set defaults for some members to -1, but not
> the entire struct. The only value we should memset a whole bag-of-stuff
> config struct to is 0, as that's the only sensible default & plays well
> with other C semantics.
> 
> 1. https://lore.kernel.org/git/patch-v3-4.5-28286a61162-20210919T084703Z-avarab@gmail.com/
> 
>>
>> Best Wishes
>>
>> Phillip
>>
>> [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
>> [2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
>>
>>> Personally I am not aware of any modern processor which provides signed
>>> integer types using other than two's complement.
>>> [0]
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html
>>>
> 

core.commitGraph::
If true, then git will read the commit-graph file (if it exists)
to parse the graph structure of commits. Defaults to false. See
to parse the graph structure of commits. Defaults to true. See
linkgit:git-commit-graph[1] for more information.

core.useReplaceRefs::
Expand Down
2 changes: 1 addition & 1 deletion Documentation/config/gc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ gc.writeCommitGraph::
If true, then gc will rewrite the commit-graph file when
linkgit:git-gc[1] is run. When using `git gc --auto`
the commit-graph will be updated if housekeeping is
required. Default is false. See linkgit:git-commit-graph[1]
required. Default is true. See linkgit:git-commit-graph[1]
for details.

gc.logExpiry::
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -964,6 +964,7 @@ LIB_OBJS += refspec.o
LIB_OBJS += ref-filter.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 22 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index c18efadda5..243be2907b 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -27,6 +27,7 @@
>  #include "pack-objects.h"
>  #include "blob.h"
>  #include "tree.h"
> +#include "repo-settings.h"
>
>  #define FAILED_RUN "failed to run %s"
>
> @@ -41,7 +42,6 @@ static int aggressive_depth =3D 50;
>  static int aggressive_window =3D 250;
>  static int gc_auto_threshold =3D 6700;
>  static int gc_auto_pack_limit =3D 50;
> -static int gc_write_commit_graph;

I _really_ like that direction. Anything that removes global state will
improve Git's source code.

> [...]
> diff --git a/read-cache.c b/read-cache.c
> index c701f7f8b8..ee1aaa8917 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> [...]
> @@ -2765,7 +2767,7 @@ static int do_write_index(struct index_state *ista=
te, struct tempfile *tempfile,
>  	}
>
>  	if (!istate->version) {
> -		istate->version =3D get_index_format_default();
> +		istate->version =3D get_index_format_default(the_repository);

It is too bad that `read-cache.h` is not `the_repository`-free at the
moment...

>  		if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
>  			init_split_index(istate);
>  	}
> diff --git a/repo-settings.c b/repo-settings.c
> new file mode 100644
> index 0000000000..13a9128f62
> --- /dev/null
> +++ b/repo-settings.c
> @@ -0,0 +1,44 @@
> +#include "cache.h"
> +#include "repository.h"
> +#include "config.h"
> +#include "repo-settings.h"
> +
> +static int git_repo_config(const char *key, const char *value, void *cb=
)
> +{
> +	struct repo_settings *rs =3D (struct repo_settings *)cb;
> +
> +	if (!strcmp(key, "core.commitgraph")) {
> +		rs->core_commit_graph =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "gc.writecommitgraph")) {
> +		rs->gc_write_commit_graph =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "pack.usesparse")) {
> +		rs->pack_use_sparse =3D git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "index.version")) {
> +		rs->index_version =3D git_config_int(key, value);
> +		return 0;
> +	}

I would actually prefer to use the `repo_config_get_*()` family here.
That way, we really avoid re-parsing the config.

> +
> +	return 1;
> +}
> +
> +void prepare_repo_settings(struct repository *r)
> +{
> +	if (r->settings)
> +		return;
> +
> +	r->settings =3D xmalloc(sizeof(*r->settings));
> +
> +	/* Defaults */
> +	r->settings->core_commit_graph =3D -1;
> +	r->settings->gc_write_commit_graph =3D -1;
> +	r->settings->pack_use_sparse =3D -1;
> +	r->settings->index_version =3D -1;
> +
> +	repo_config(r, git_repo_config, r->settings);
> +}
> diff --git a/repo-settings.h b/repo-settings.h
> new file mode 100644
> index 0000000000..1151c2193a
> --- /dev/null
> +++ b/repo-settings.h
> @@ -0,0 +1,15 @@
> +#ifndef REPO_SETTINGS_H
> +#define REPO_SETTINGS_H
> +
> +struct repo_settings {
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +	int pack_use_sparse;
> +	int index_version;
> +};
> +
> +struct repository;
> +
> +void prepare_repo_settings(struct repository *r);

Hmm. I can see that you wanted to encapsulate this, but I do not really
agree that this needs to be encapsulated away from `repository.h`. I'd
rather declare `struct repo_settings` in `repository.h` and then make
the `settings` a field of that type (as opposed to a pointer to that
type). In general, I like to avoid unnecessary `malloc()`s, and this
here instance is one of them.

Thanks,
Dscho

> +
> +#endif /* REPO_SETTINGS_H */
> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..352afc9cd8 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  #include "path.h"
>
>  struct config_set;
> +struct repo_settings;
>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -72,6 +73,8 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>
> +	struct repo_settings *settings;
> +
>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> --
> gitgitgadget
>
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Wed, 24 Jul 2019, Derrick Stolee via GitGitGadget wrote:

> diff --git a/repo-settings.h b/repo-settings.h
> new file mode 100644
> index 0000000000..89fb0159bf
> --- /dev/null
> +++ b/repo-settings.h
> @@ -0,0 +1,17 @@
> +#ifndef REPO_SETTINGS_H
> +#define REPO_SETTINGS_H
> +
> +struct repo_settings {
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +
> +	int index_version;
> +
> +	int pack_use_sparse;
> +};
> +
> +struct repository;
> +
> +void prepare_repo_settings(struct repository *r);
> +
> +#endif /* REPO_SETTINGS_H */
> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..a817486825 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -2,8 +2,10 @@
>  #define REPOSITORY_H
>
>  #include "path.h"
> +#include "repo-settings.h"

I still think that the `repo_settings` struct could just as easily be
declared in `repository.h`. No need to invent a new header file.

>
>  struct config_set;
> +struct repo_settings;

In any case, this is no longer necessary.

>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -72,6 +74,9 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>
> +	int settings_initialized;

Or maybe

	unsigned settings_initialized:1;

?

> +	struct repo_settings settings;
> +

Or maybe even fold the `initialized` flag into that struct?

Thanks,
Dscho

>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> --
> gitgitgadget
>
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/repository.h b/repository.h
> index 4fb6a5885f..2bb2bc3eea 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  #include "path.h"
>  
>  struct config_set;
> +struct repo_settings;

Given that the next hunk you introduce the real thing, and nobody
refers to it until then, I do not see why we want to have a forward
declaration.

>  struct git_hash_algo;
>  struct index_state;
>  struct lock_file;
> @@ -11,6 +12,17 @@ struct pathspec;
>  struct raw_object_store;
>  struct submodule_cache;
>  
> +struct repo_settings {
> +	int initialized;
> +
> +	int core_commit_graph;
> +	int gc_write_commit_graph;
> +
> +	int index_version;
> +
> +	int pack_use_sparse;
> +};
> +
>  struct repository {
>  	/* Environment */
>  	/*
> @@ -72,6 +84,8 @@ struct repository {
>  	 */
>  	char *submodule_prefix;
>  
> +	struct repo_settings settings;
> +
>  	/* Subsystems */
>  	/*
>  	 * Repository's config which contains key-value pairs from the usual
> @@ -157,5 +171,6 @@ int repo_read_index_unmerged(struct repository *);
>   */
>  void repo_update_index_if_able(struct repository *, struct lock_file *);
>  
> +void prepare_repo_settings(struct repository *r);
>  
>  #endif /* REPOSITORY_H */

LIB_OBJS += remote.o
LIB_OBJS += replace-object.o
LIB_OBJS += repo-settings.o
LIB_OBJS += repository.o
LIB_OBJS += rerere.o
LIB_OBJS += resolve-undo.o
Expand Down
12 changes: 5 additions & 7 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ static int aggressive_depth = 50;
static int aggressive_window = 250;
static int gc_auto_threshold = 6700;
static int gc_auto_pack_limit = 50;
static int gc_write_commit_graph;
static int detach_auto = 1;
static timestamp_t gc_log_expire_time;
static const char *gc_log_expire = "1.day.ago";
Expand Down Expand Up @@ -148,7 +147,6 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
git_config_get_int("gc.auto", &gc_auto_threshold);
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
git_config_get_bool("gc.autodetach", &detach_auto);
git_config_get_expiry("gc.pruneexpire", &prune_expire);
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
Expand Down Expand Up @@ -685,11 +683,11 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
clean_pack_garbage();
}

if (gc_write_commit_graph &&
write_commit_graph_reachable(get_object_directory(),
!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0,
NULL))
return 1;
prepare_repo_settings(the_repository);
if (the_repository->settings.gc_write_commit_graph == 1)
write_commit_graph_reachable(get_object_directory(),
!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0,
NULL);

if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
Expand Down
8 changes: 4 additions & 4 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -2709,10 +2709,6 @@ static int git_pack_config(const char *k, const char *v, void *cb)
use_bitmap_index_default = git_config_bool(k, v);
return 0;
}
if (!strcmp(k, "pack.usesparse")) {
sparse = git_config_bool(k, v);
return 0;
}
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
Expand Down Expand Up @@ -3332,6 +3328,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
read_replace_refs = 0;

sparse = git_env_bool("GIT_TEST_PACK_SPARSE", 0);
prepare_repo_settings(the_repository);
if (!sparse && the_repository->settings.pack_use_sparse != -1)
sparse = the_repository->settings.pack_use_sparse;

reset_pack_idx_option(&pack_idx_opts);
git_config(git_pack_config, NULL);

Expand Down
6 changes: 3 additions & 3 deletions commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -466,7 +466,6 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
static int prepare_commit_graph(struct repository *r)
{
struct object_directory *odb;
int config_value;

if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
die("dying as requested by the '%s' variable on commit-graph load!",
Expand All @@ -476,9 +475,10 @@ static int prepare_commit_graph(struct repository *r)
return !!r->objects->commit_graph;
r->objects->commit_graph_attempted = 1;

prepare_repo_settings(r);

if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
(repo_config_get_bool(r, "core.commitgraph", &config_value) ||
!config_value))
r->settings.core_commit_graph != 1)
/*
* This repository is not configured to use commit graphs, so
* do not load one. (But report commit_graph_attempted anyway
Expand Down
11 changes: 6 additions & 5 deletions read-cache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1599,16 +1599,17 @@ struct cache_entry *refresh_cache_entry(struct index_state *istate,

#define INDEX_FORMAT_DEFAULT 3

static unsigned int get_index_format_default(void)
static unsigned int get_index_format_default(struct repository *r)
{
char *envversion = getenv("GIT_INDEX_VERSION");
char *endp;
int value;
unsigned int version = INDEX_FORMAT_DEFAULT;

if (!envversion) {
if (!git_config_get_int("index.version", &value))
version = value;
prepare_repo_settings(r);

if (r->settings.index_version >= 0)
version = r->settings.index_version;
if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
warning(_("index.version set, but the value is invalid.\n"
"Using version %i"), INDEX_FORMAT_DEFAULT);
Expand Down Expand Up @@ -2765,7 +2766,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
}

if (!istate->version) {
istate->version = get_index_format_default();
istate->version = get_index_format_default(the_repository);
if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
init_split_index(istate);
}
Expand Down
29 changes: 29 additions & 0 deletions repo-settings.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#include "cache.h"
#include "config.h"
#include "repository.h"

#define UPDATE_DEFAULT_BOOL(s,v) do { if (s == -1) { s = v; } } while(0)

void prepare_repo_settings(struct repository *r)
{
int value;

if (r->settings.initialized)
return;

/* Defaults */
memset(&r->settings, -1, sizeof(r->settings));

if (!repo_config_get_bool(r, "core.commitgraph", &value))
r->settings.core_commit_graph = value;
if (!repo_config_get_bool(r, "gc.writecommitgraph", &value))
r->settings.gc_write_commit_graph = value;
UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1);
UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1);

if (!repo_config_get_bool(r, "index.version", &value))
r->settings.index_version = value;

if (!repo_config_get_bool(r, "pack.usesparse", &value))
r->settings.pack_use_sparse = value;
}
14 changes: 14 additions & 0 deletions repository.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@ struct pathspec;
struct raw_object_store;
struct submodule_cache;

struct repo_settings {
int initialized;

int core_commit_graph;
int gc_write_commit_graph;

int index_version;

int pack_use_sparse;
};

struct repository {
/* Environment */
/*
Expand Down Expand Up @@ -72,6 +83,8 @@ struct repository {
*/
char *submodule_prefix;

struct repo_settings settings;

/* Subsystems */
/*
* Repository's config which contains key-value pairs from the usual
Expand Down Expand Up @@ -157,5 +170,6 @@ int repo_read_index_unmerged(struct repository *);
*/
void repo_update_index_if_able(struct repository *, struct lock_file *);

void prepare_repo_settings(struct repository *r);

#endif /* REPOSITORY_H */
2 changes: 1 addition & 1 deletion t/t0410-partial-clone.sh
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '

git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.partialclone "arbitrary string" &&
GIT_TEST_COMMIT_GRAPH=0 git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
GIT_TEST_COMMIT_GRAPH=0 git -C repo -c core.commitGraph=false rev-list --exclude-promisor-objects --objects bar >out &&
grep $(git -C repo rev-parse bar) out &&
! grep $FOO out
'
Expand Down
4 changes: 2 additions & 2 deletions t/t5307-pack-missing-commit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ test_expect_success 'check corruption' '
'

test_expect_success 'rev-list notices corruption (1)' '
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git rev-list HEAD
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git -c core.commitGraph=false rev-list HEAD
'

test_expect_success 'rev-list notices corruption (2)' '
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git rev-list --objects HEAD
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git -c core.commitGraph=false rev-list --objects HEAD
'

test_expect_success 'pack-objects notices corruption' '
Expand Down
2 changes: 2 additions & 0 deletions t/t5324-split-commit-graph.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ GIT_TEST_COMMIT_GRAPH=0
test_expect_success 'setup repo' '
git init &&
git config core.commitGraph true &&
git config gc.writeCommitGraph false &&
infodir=".git/objects/info" &&
graphdir="$infodir/commit-graphs" &&
test_oid_init
Expand Down Expand Up @@ -332,6 +333,7 @@ test_expect_success 'split across alternate where alternate is not split' '
git clone --no-hardlinks . alt-split &&
(
cd alt-split &&
rm -f .git/objects/info/commit-graph &&
echo "$(pwd)"/../.git/objects >.git/objects/info/alternates &&
test_commit 18 &&
git commit-graph write --reachable --split &&
Expand Down
2 changes: 1 addition & 1 deletion t/t6011-rev-list-with-bad-commit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ test_expect_success 'corrupt second commit object' \
'

test_expect_success 'rev-list should fail' '
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git rev-list --all > /dev/null
test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git -c core.commitGraph=false rev-list --all > /dev/null
'

test_expect_success 'git repack _MUST_ fail' \
Expand Down
6 changes: 3 additions & 3 deletions t/t6501-freshen-objects.sh
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ test_expect_success 'do not complain about existing broken links (commit)' '
some message
EOF
commit=$(git hash-object -t commit -w broken-commit) &&
git gc 2>stderr &&
git gc -q 2>stderr &&
verbose git cat-file -e $commit &&
test_must_be_empty stderr
'
Expand All @@ -147,7 +147,7 @@ test_expect_success 'do not complain about existing broken links (tree)' '
100644 blob 0000000000000000000000000000000000000003 foo
EOF
tree=$(git mktree --missing <broken-tree) &&
git gc 2>stderr &&
git gc -q 2>stderr &&
git cat-file -e $tree &&
test_must_be_empty stderr
'
Expand All @@ -162,7 +162,7 @@ test_expect_success 'do not complain about existing broken links (tag)' '
this is a broken tag
EOF
tag=$(git hash-object -t tag -w broken-tag) &&
git gc 2>stderr &&
git gc -q 2>stderr &&
git cat-file -e $tag &&
test_must_be_empty stderr
'
Expand Down