Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
midx: add MIDX_PROGRESS flag
Add the MIDX_PROGRESS flag and update the
write|verify|expire|repack functions in midx.h
to accept a flags parameter.  The MIDX_PROGRESS
flag indicates whether the caller of the function
would like progress information to be displayed.
This patch only changes the method prototypes
and does not change the functionality. The
functionality change will be handled by a later patch.

Signed-off-by: William Baker <William.Baker@microsoft.com>
  • Loading branch information
wilbaker committed Oct 21, 2019
commit e7b1cc633bc8f5dbdb4f113456d20f1f11476662
8 changes: 4 additions & 4 deletions builtin/multi-pack-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
trace2_cmd_mode(argv[0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"William Baker via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: William Baker <William.Baker@microsoft.com>
>
> Signed-off-by: William Baker <William.Baker@microsoft.com>
> ---
>  builtin/multi-pack-index.c |  8 ++++----
>  builtin/repack.c           |  2 +-
>  midx.c                     |  8 ++++----
>  midx.h                     | 10 ++++++----
>  4 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index b1ea1a6aa1..e86b8cd17d 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -47,16 +47,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
>  	trace2_cmd_mode(argv[0]);
>  
>  	if (!strcmp(argv[0], "repack"))
> -		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
> +		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size, 0);
>  	if (opts.batch_size)
>  		die(_("--batch-size option is only for 'repack' subcommand"));
>  
>  	if (!strcmp(argv[0], "write"))
> -		return write_midx_file(opts.object_dir);
> +		return write_midx_file(opts.object_dir, 0);
>  	if (!strcmp(argv[0], "verify"))
> -		return verify_midx_file(the_repository, opts.object_dir);
> +		return verify_midx_file(the_repository, opts.object_dir, 0);
>  	if (!strcmp(argv[0], "expire"))
> -		return expire_midx_packs(the_repository, opts.object_dir);
> +		return expire_midx_packs(the_repository, opts.object_dir, 0);

Hmm, I actually would have expected to see a new .progress field in
the opts structure and these lower-level functions would start
taking a pointer to the whole opts structure, instead of just a
pointer to a single member opts.object_dir.  That way, we can later
extend and enrich the interface between this caller and the
underlying functions without much patch noise in the dispatch layer
(i.e. here).

This step is meant to be just preparing by extending the interface
and passing the new argument throughout the callchain, I believe,
and it looks correctly done, assuming that we are OK to take this
"add a separate 'progress' parameter, when we need more parameters
later, the interface will gain more and more parameters" approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> This step is meant to be just preparing by extending the interface
> and passing the new argument throughout the callchain, I believe,
> and it looks correctly done, assuming that we are OK to take this
> "add a separate 'progress' parameter, when we need more parameters
> later, the interface will gain more and more parameters" approach.

After re-reading the remainder of the series, I think the structure
this patch series takes (i.e. adding a new parameter to these callees)
is better than the alternative (i.e. making sure everybody takes the
pointer to the opts structure).

Thanks.  I couldn't review the proposed log messages of these
commits, which were unreadable, at all, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Fri, Sep 20, 2019 at 09:53:48AM -0700, William Baker via GitGitGadget wrote:
> diff --git a/midx.h b/midx.h
> index f0ae656b5d..e6fa356b5c 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -37,6 +37,8 @@ struct multi_pack_index {
>  	char object_dir[FLEX_ARRAY];
>  };
>  
> +#define MIDX_PROGRESS     (1 << 0)

Please consider using an enum.

A preprocessor constant is just that: a number.  It is shown as a
number while debugging, and there is no clear indication what related
values belong in the same group (apart from the prefix of the macro
name), or what possible values an 'unsigned flags' variable might have
(though in this particular case there is only a single possible
value...).  

An enum, however, is much friendlier to humans when debugging, because
even 'gdb' shows the value using the symbolic names, e.g.:

  write_commit_graph_reachable (obj_dir=0x9ab870 ".git/objects", 
      flags=(COMMIT_GRAPH_WRITE_APPEND | COMMIT_GRAPH_WRITE_PROGRESS), 
      split_opts=0x9670e0 <split_opts>) at commit-graph.c:1141

and it's quite clear what values a variable of type 'enum
commit_graph_write_flags' can have.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/21/19 5:11 AM, SZEDER Gábor wrote:
>>  
>> +#define MIDX_PROGRESS     (1 << 0)
> 
> Please consider using an enum.
> 
> A preprocessor constant is just that: a number.  It is shown as a
> number while debugging, and there is no clear indication what related
> values belong in the same group (apart from the prefix of the macro
> name), or what possible values an 'unsigned flags' variable might have
> (though in this particular case there is only a single possible
> value...).  
> 
> An enum, however, is much friendlier to humans when debugging, because
> even 'gdb' shows the value using the symbolic names, e.g.:

Thanks for this suggestion.  I agree on the benefits of using an enum
here and will make the switch in my next set of changes.

-William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

SZEDER G=C3=A1bor <szeder.dev@gmail.com> writes:

> On Fri, Sep 20, 2019 at 09:53:48AM -0700, William Baker via GitGitGadge=
t wrote:
>> diff --git a/midx.h b/midx.h
>> index f0ae656b5d..e6fa356b5c 100644
>> --- a/midx.h
>> +++ b/midx.h
>> @@ -37,6 +37,8 @@ struct multi_pack_index {
>>  	char object_dir[FLEX_ARRAY];
>>  };
>> =20
>> +#define MIDX_PROGRESS     (1 << 0)
>
> Please consider using an enum.

If they are used by assiging one of their values, definitely a good
idea to use an enum.  Are debuggers clever enough that they can
tell, when they see something like this:

	enum gress {
		PROGRESS =3D 1,
		REGRESS =3D 2,
	};

	void func(enum gress v);

	...

        void caller(void)
	{
		func(PROGRESS | REGRESS);
		func(PROGRESS + REGRESS);
		func(PROGRESS * 3);
	}

how caller came about to give 3?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/27/19 8:40 PM, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
>>> +#define MIDX_PROGRESS     (1 << 0)
>>
>> Please consider using an enum.
> 
> If they are used by assiging one of their values, definitely a good
> idea to use an enum.  Are debuggers clever enough that they can
> tell, when they see something like this:
> 
> 	enum gress {
> 		PROGRESS = 1,
> 		REGRESS = 2,
> 	};
> 
> 	void func(enum gress v);
> 
> 	...
> 
>         void caller(void)
> 	{
> 		func(PROGRESS | REGRESS);
> 		func(PROGRESS + REGRESS);
> 		func(PROGRESS * 3);
> 	}
> 
> how caller came about to give 3?
> 

My debugger was not smart enough to figure out what flags were combined
to give the value of 3 in this example.  

I saw that the code base is currently a mix of #define and enums when it
comes to flags  (e.g. dir_struct.flags and rebase_options.flags are both
enums) and so using one here would not be something new stylistically.

Although my debugger might not be the smartest, I haven't noticed any
downsides to switching this to an enum.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <williamtbakeremail@gmail.com> writes:

> I saw that the code base is currently a mix of #define and enums when it
> comes to flags  (e.g. dir_struct.flags and rebase_options.flags are both
> enums) and so using one here would not be something new stylistically.

Yes.  But it is a different matter to spread a bad practice to
parts of the code that haven't been contaminated by it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Oct 08, 2019 at 01:30:34PM +0900, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
> >> 		func(PROGRESS | REGRESS);
> >> 		func(PROGRESS + REGRESS);
> >> 		func(PROGRESS * 3);
> >> 	}
> >> 
> >> how caller came about to give 3?
> >
> > No, they tend to show (PROGRESS | REGRESS), at least both gdb and lldb
> > do.

I was wrong here, gdb does this, but lldb, unfortunately, doesn't; see
my other reply in this thread.

> OK.
> 
> >  If the enum has only constants with power-of-two values, then that
> > is the right way to write it, and the other two are asking for trouble
> 
> If the programmer and the debugger knows the constants are used to
> represent bits that can be or'ed together, what you say is correct,
> but that is entirely irrelevant.
> 
> What I was worried about is that the constants that are used to
> represent something that are *NOT* set of bits (hence "PROGRESS * 3"
> may be perfectly a reasonable thing for such an application)

I don't really see how that could be reasonable, it's prone to break
when changing the values of the enum constants.

> may be
> mistaken by an overly clever debugger and "3" may end up getting
> shown as "PROGRESS | REGRESS".  When there are only two constants
> (PROGRESS=1 and REGRESS=2), we humans nor debuggers can tell if that
> is to represent two bits that can be or'ed together, or it is an
> enumeration.
> 
> Until we gain the third constant, that is, at which time the third
> one may likely be 3 (if enumeration) or 4 (if bits).

Humans benefit from context: they understand the name of the enum type
(e.g. does it end with "_flags"?), the name of the enum constants, and
the comment above the enum's definition (if any), and can then infer
whether those constants represent OR-able bits or not.  If they can't
find this out, then that enum is poorly named and/or documented, which
should be fixed.  As for the patch that I originally commented on, I
would expect the enum to be called e.g. 'midx_flags', and thus already
with that single constant in there it'll be clear that it is intended
as a collection of related OR-able bits.

As for the debugger, if it sees a variable of an enum type whose value
doesn't match any of the enum constants, then there are basically
three possibilities:

  - All constants in that enum have power-of-two values.  In this case
    it's reasonable from the debugger to assume that those constants
    are OR'ed together, and is extremely helpful to display the value
    that way.

  - The constants are just a set of values (1, 2, 3, 42, etc).  In
    this case the variable shouldn't have a value that doesn't match
    one of the constants in the first place, and I would first suspect
    that the program might be buggy.

  - A "mostly" power-of-two enum might contain shorthand constants for
    combinations of a set of other constants, e.g.:

      enum flags {
              BIT0 = (1 << 0),
              BIT1 = (1 << 1),
              BIT2 = (1 << 2),

              FIRST_TWO = (BIT0 | BIT1),
      };
      enum flags f0 = BIT0;
      enum flags f1 = BIT0 | BIT1;
      enum flags f2 = BIT0 | BIT2;
      enum flags f3 = BIT0 | BIT1 | BIT2;

    In this case, sadly, gdb shows only matching constants:

      (gdb) p f0
      $1 = BIT0
      (gdb) p f1
      $2 = FIRST_TWO
      (gdb) p f2
      $3 = 5
      (gdb) p f3
      $4 = 7

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/8/19 6:32 PM, SZEDER Gábor wrote:
>>> No, they tend to show (PROGRESS | REGRESS), at least both gdb and lldb
>>> do.
> 
> I was wrong here, gdb does this, but lldb, unfortunately, doesn't; see
> my other reply in this thread.
>
>> What I was worried about is that the constants that are used to
>> represent something that are *NOT* set of bits (hence "PROGRESS * 3"
>> may be perfectly a reasonable thing for such an application)
> 
> I don't really see how that could be reasonable, it's prone to break
> when changing the values of the enum constants.
> 
>> may be
>> mistaken by an overly clever debugger and "3" may end up getting
>> shown as "PROGRESS | REGRESS".  When there are only two constants
>> (PROGRESS=1 and REGRESS=2), we humans nor debuggers can tell if that
>> is to represent two bits that can be or'ed together, or it is an
>> enumeration.
>>
>> Until we gain the third constant, that is, at which time the third
>> one may likely be 3 (if enumeration) or 4 (if bits).
> 
> Humans benefit from context: they understand the name of the enum type
> (e.g. does it end with "_flags"?), the name of the enum constants, and
> the comment above the enum's definition (if any), and can then infer
> whether those constants represent OR-able bits or not.  If they can't
> find this out, then that enum is poorly named and/or documented, which
> should be fixed.  As for the patch that I originally commented on, I
> would expect the enum to be called e.g. 'midx_flags', and thus already
> with that single constant in there it'll be clear that it is intended
> as a collection of related OR-able bits.
> 
> As for the debugger, if it sees a variable of an enum type whose value
> doesn't match any of the enum constants, then there are basically
> three possibilities:
> 
>   - All constants in that enum have power-of-two values.  In this case
>     it's reasonable from the debugger to assume that those constants
>     are OR'ed together, and is extremely helpful to display the value
>     that way.
> 
>   - The constants are just a set of values (1, 2, 3, 42, etc).  In
>     this case the variable shouldn't have a value that doesn't match
>     one of the constants in the first place, and I would first suspect
>     that the program might be buggy.
> 
>   - A "mostly" power-of-two enum might contain shorthand constants for
>     combinations of a set of other constants, e.g.:
> 
>       enum flags {
>               BIT0 = (1 << 0),
>               BIT1 = (1 << 1),
>               BIT2 = (1 << 2),
> 
>               FIRST_TWO = (BIT0 | BIT1),
>       };
>       enum flags f0 = BIT0;
>       enum flags f1 = BIT0 | BIT1;
>       enum flags f2 = BIT0 | BIT2;
>       enum flags f3 = BIT0 | BIT1 | BIT2;
> 
>     In this case, sadly, gdb shows only matching constants:
> 
>       (gdb) p f0
>       $1 = BIT0
>       (gdb) p f1
>       $2 = FIRST_TWO
>       (gdb) p f2
>       $3 = 5
>       (gdb) p f3
>       $4 = 7
> 

I believe this is the last open thread/discussion on the "midx: add MIDX_PROGRESS
flag" patch series.  

A quick summary of where the discussion stands:

- The patch series currently uses #define for the new MIDX_PROGRESS flag.
- The git codebase uses both #defines and enums for flags.
- The debugger always shows the enum value's name if there is a match, if values
  are OR-ed together there a few possibilities (see discussion above and earlier
  in the thread).
- There are concerns regarding misinterpreting constants that are not a set of
  bits (e.g. "PROGRESS * 3").

Are there any other details I can provide that would help in reaching a
conclusion as to how to proceed?

At this time there are no other MIDX_* flags and so there's always the option
to revisit the best way to handle multiple MIDX_* flags if/when additional
flags are added.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <williamtbakeremail@gmail.com> writes:

> At this time there are no other MIDX_* flags and so there's always the option
> to revisit the best way to handle multiple MIDX_* flags if/when additional
> flags are added.

I do not care too deeply either way, but if you wrote it in one way,
how about completing the series without changing it in the middle,
and leave the clean-ups to a follow-up series (if needed)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/15/19 7:09 PM, Junio C Hamano wrote:
>> At this time there are no other MIDX_* flags and so there's always the option
>> to revisit the best way to handle multiple MIDX_* flags if/when additional
>> flags are added.
> 
> I do not care too deeply either way, but if you wrote it in one way,
> how about completing the series without changing it in the middle,
> and leave the clean-ups to a follow-up series (if needed)?


That plan sounds good to me.  The most recent series (v3) should be ready
to go, I don't believe there is any outstanding feedback to address.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/16/19 12:48 PM, William Baker wrote:>> I do not care too deeply either way, but if you wrote it in one way,
>> how about completing the series without changing it in the middle,
>> and leave the clean-ups to a follow-up series (if needed)?
> 
> 
> That plan sounds good to me.  The most recent series (v3) should be ready
> to go, I don't believe there is any outstanding feedback to address.

Follow-up question on this patch series.

I noticed there is a branch named 'wb/midx-progress' in
https://github.com/gitster/git but it does not appear to have the commits
from this patch series and I have not seen it mentioned in
"What's cooking in git.git" emails.

Is there anything else I should do to get these changes picked up in 'pu'?

Thanks again,
William


if (!strcmp(argv[0], "repack"))
return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size, 0);
if (opts.batch_size)
die(_("--batch-size option is only for 'repack' subcommand"));

if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
return write_midx_file(opts.object_dir, 0);
if (!strcmp(argv[0], "verify"))
return verify_midx_file(the_repository, opts.object_dir);
return verify_midx_file(the_repository, opts.object_dir, 0);
if (!strcmp(argv[0], "expire"))
return expire_midx_packs(the_repository, opts.object_dir);
return expire_midx_packs(the_repository, opts.object_dir, 0);

die(_("unrecognized subcommand: %s"), argv[0]);
}
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -562,7 +562,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_temporary_files();

if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
write_midx_file(get_object_directory());
write_midx_file(get_object_directory(), 0);

string_list_clear(&names, 0);
string_list_clear(&rollback, 0);
Expand Down
8 changes: 4 additions & 4 deletions midx.c
Original file line number Diff line number Diff line change
Expand Up @@ -1016,7 +1016,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
return result;
}

int write_midx_file(const char *object_dir)
int write_midx_file(const char *object_dir, unsigned flags)
{
return write_midx_internal(object_dir, NULL, NULL);
}
Expand Down Expand Up @@ -1076,7 +1076,7 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b)
display_progress(progress, _n); \
} while (0)

int verify_midx_file(struct repository *r, const char *object_dir)
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags)
{
struct pair_pos_vs_id *pairs = NULL;
uint32_t i;
Expand Down Expand Up @@ -1183,7 +1183,7 @@ int verify_midx_file(struct repository *r, const char *object_dir)
return verify_midx_error;
}

int expire_midx_packs(struct repository *r, const char *object_dir)
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
{
uint32_t i, *count, result = 0;
struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
Expand Down Expand Up @@ -1315,7 +1315,7 @@ static int fill_included_packs_batch(struct repository *r,
return 0;
}

int midx_repack(struct repository *r, const char *object_dir, size_t batch_size)
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
{
int result = 0;
uint32_t i;
Expand Down
10 changes: 6 additions & 4 deletions midx.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ struct multi_pack_index {
char object_dir[FLEX_ARRAY];
};

#define MIDX_PROGRESS (1 << 0)

struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
Expand All @@ -47,11 +49,11 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);

int write_midx_file(const char *object_dir);
int write_midx_file(const char *object_dir, unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct repository *r, const char *object_dir);
int expire_midx_packs(struct repository *r, const char *object_dir);
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size);
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);

void close_midx(struct multi_pack_index *m);

Expand Down