Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support diff merges option in range diff #1734

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented May 23, 2024

The git range-diff command does the same with merge commits as git rebase: It ignores them.

However, when comparing branch thickets it can be quite illuminating to watch out for inadvertent changes in merge commits, in particular when some "evil" merges have been replayed, i.e. merges that needed to introduce changes outside of the merge conflicts (e.g. when one branch changed a function's signature and another branch introduced a caller of said function), in case the replayed merge is no longer "evil" and therefore potentially incorrect.

Let's introduce support for the --diff-merges option that is passed through to those git log commands.

I had a need for this earlier this year and got it working, leaving the GitGitGadget PR in a draft mode. Phil Blain found it and kindly nerd-sniped me into readying it for submitting, so say thanks to Phil!

Changes since v1:

  • Changed the documentation to recommend first-parent mode instead of vaguely talking about various modes' merits.
  • Disallowed the no-arg --diff-merges option (because --diff-merges requires an argument).

Cc: Philippe Blain levraiphilippeblain@gmail.com
cc: Johannes Sixt j6t@kdbg.org
cc: Elijah Newren newren@gmail.com

@phil-blain
Copy link

Hi @dscho :)

I just came across the need for this today, and remembered you had a WIP PR about it.
Apart from missing tests, is this PR in good shape ? are you using it in your day-to-day ?

@phil-blain
Copy link

In any case, I'll add it to my build and give it a try.

@dscho
Copy link
Member Author

dscho commented Nov 4, 2024

Apart from missing tests, is this PR in good shape ?

@phil-blain Well, missing tests, missing documentation, and missing commit message ;-)

are you using it in your day-to-day ?

No, althought I recently wished for it but didn't find the time to build it in a worktree so that I could use it.

@dscho dscho force-pushed the support-diff-merges-option-in-range-diff branch from ac9ecc8 to 0ae0adc Compare November 7, 2024 15:51
@dscho
Copy link
Member Author

dscho commented Nov 7, 2024

Apart from missing tests, is this PR in good shape ?

@phil-blain Well, missing tests, missing documentation, and missing commit message ;-)

@phil-blain well, now it no longer misses those things...

@dscho dscho force-pushed the support-diff-merges-option-in-range-diff branch 2 times, most recently from 8baf3fa to 11361e0 Compare November 7, 2024 16:16
@dscho dscho marked this pull request as ready for review November 7, 2024 17:17
@dscho
Copy link
Member Author

dscho commented Nov 7, 2024

/submit

Copy link

gitgitgadget bot commented Nov 7, 2024

Submitted as pull.1734.git.1731000007391.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1734/dscho/support-diff-merges-option-in-range-diff-v1

To fetch this version to local tag pr-1734/dscho/support-diff-merges-option-in-range-diff-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1734/dscho/support-diff-merges-option-in-range-diff-v1

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.

Nice addition.

As the "diff of the diff" outer diff part would be two-"blob" diff,
we won't have to worry about this option causing confusions to
users, unlike things like "-U8", to which layer of diffs the option
would apply.

Will queue.  Thanks.

Copy link

gitgitgadget bot commented Nov 8, 2024

This patch series was integrated into seen via git@992e323.

@gitgitgadget gitgitgadget bot added the seen label Nov 8, 2024
Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Johannes Sixt wrote (reply to this):

Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> +the context of the `range-diff` command than other formats, so choose wisely!

Can we please be a bit more specific which options are usable or which
are not usable? Perhaps you even mean to say that 'first-parent' is the
only one that makes sense? At a minimum, we should mention that it is
the one that makes most sense (if that is the case).

-- Hannes

Copy link

gitgitgadget bot commented Nov 8, 2024

User Johannes Sixt <j6t@kdbg.org> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Hannes

On Fri, 8 Nov 2024, Johannes Sixt wrote:

> Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
> > +--diff-merges=<format>::
> > +	Instead of ignoring merge commits, generate diffs for them using the
> > +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +	and include them in the comparison.
> > ++
> > +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> > +the context of the `range-diff` command than other formats, so choose wisely!
>
> Can we please be a bit more specific which options are usable or which
> are not usable?

There are a couple of reasons:

- Most importantly: I had not even studied that question in depth when I
  wrote the original patch, and not even when I submitted the first
  iteration.

  In my reviews of branch thickets, I used `--diff-merges=1` and it served
  my needs well. Therefore I can speak to this mode, but not really to the
  other modes.

- Which options are usable or not may lack a universally-correct answer.

  One person might find `--diff-merges=separate` confusing while the next
  person may find it quite useful.

- If the documentation points out an unsuitable mode, then an obvious
  follow-up wish would be to reject this mode in the command, with an
  error message. Some users might, however, need precisely this mode, so
  I am loathe to do that.

Maybe this compromise: Change the "Note:" to recommend the `first-parent`
mode, with a bit more explanation why that is so (it is consistent with
the idea that a merge is kind of a "meta patch", comprising all the merged
commits' patches, which is equivalent to the first-parent diff).

> Perhaps you even mean to say that 'first-parent' is the only one that
> makes sense?

I would not go _so_ far as to say that.

Before submitting the patch yesterday, I played a little with a few
modes, using the commit graph as per the new t3206 test case.

What that test case range-diffs looks like this:

 - * - changeA - changeB ------------
     \                     \          \
      \                      conflict  \
       \                   /            \
         rename - changeA - changeB ----- clean

In other words, the main branch modifies a file's contents twice, the
topic branch renames it first, then makes the same modifications.

The clean merge simply merges the topic, which makes the main branch
tree-same to the topic branch.

The conflicting merge misses the tip commit of the topic branch, i.e.
changeB on the renamed file. The result is tree-same to the clean merge
because changeB on the non-renamed file is already part of the main
branch.

Out of habit, I used `--diff-merge=first-parent` in the new test case,
whose output reiterates what I just said: both merges introduce the same
first-parent diff (renaming the file, no other changes):

	1:  1e6226b < -:  ------- s/12/B/
	2:  043e738 = 1:  6ddec15 merge

What about `separate`?


	2:  043e738 = 1:  6ddec15 merge
	1:  1e6226b ! 2:  6ddec15 s/12/B/
	    @@
	      ## Metadata ##
	    -Author: Thomas Rast <trast@inf.ethz.ch>
	    +Author: A U Thor <author@example.com>

	      ## Commit message ##
	    -    s/12/B/
	    +    merge

	      ## renamed-file ##
	     @@ renamed-file: A

This might look confusing at first because there are two merge diffs on
the right-hand side, but only one on the left-hand side. But that is all
good because the diff of the clean merge between merge head and merge is
empty and therefore not shown. And the `range-diff` demonstrates that the
conflicting merge had to recapitulate the tip commit of the `renamed-file`
topic branch.

The `combined` and `dense-combined` modes produce identical output to
`first-parent` in this instance, which is expected.

Now, let's use `remerge`, which should be illuminating with regards to
"evil merges" [*1*], as it shows what was done on top of the automatic
merge:

	1:  1e6226b < -:  ------- s/12/B/
	2:  043e738 < -:  ------- merge
	-:  ------- > 1:  6ddec15 merge

Huh. That is a bit surprising at first. Let's use a higher creation factor
to ask `range-diff` to match correspondences more loosely (I used 999,
which kinda forces even non-sensical pairings):

	1:  1e6226b ! 1:  6ddec15 s/12/B/
	    @@
	      ## Metadata ##
	    -Author: Thomas Rast <trast@inf.ethz.ch>
	    +Author: A U Thor <author@example.com>

	      ## Commit message ##
	    -    s/12/B/
	    +    merge

	      ## renamed-file ##
	    + remerge CONFLICT (content): Merge conflict in renamed-file
	    + index 3f1c0da..25ddf7a 100644
	    + --- renamed-file
	    + +++ renamed-file
	     @@ renamed-file: A
	      9
	      10
	      B
	    +-<<<<<<< 2901f77 (s/12/B/):file
	    + B
	    +-=======
	     -12
	    -+B
	    +->>>>>>> 3ce7af6 (s/11/B/):renamed-file
	      13
	      14
	      15
	2:  043e738 < -:  ------- merge

(Note that if you repeat this experiment, you will see something different
due to a bug in `--remerge-diff` that I will fix separately.)

Hold on, that looks wrong! It is actually not wrong, `range-diff` just
finds the changeB a better pairing than the merge commit with an empty
diff...

So: This mode can be useful to detect where merge conflicts had to be
resolved.

In short, I think that in `range-diff`'s context all of these modes have
merit, depending on a particular use case. It's just that `first-parent`
is probably the most natural to use in the common case.

> At a minimum, we should mention that it is the one that makes most sense
> (if that is the case).

Yeah, I'll go with that and drop saying anything about other modes.

Ciao,
Johannes

Footnote *1*: We should really find a much better name for this than "evil
merge". There is nothing evil about me having to add `#include
"environment.h"` in v2.45.1^, for example. It was necessary so that the
code would build. Tedious, yes, but not evil.

@dscho dscho force-pushed the support-diff-merges-option-in-range-diff branch from 11361e0 to 7ddc69e Compare November 8, 2024 11:20
The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.

Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between iterations of non-linear
contributions, where so-called "evil merges" are sometimes necessary and
need to be reviewed, too.

In my code reviews, I found the `--diff-merges=first-parent` option
particularly useful.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the support-diff-merges-option-in-range-diff branch from 7ddc69e to d01a395 Compare November 8, 2024 11:32
Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Sixt <j6t@kdbg.org> writes:

> Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
>> +--diff-merges=<format>::
>> +	Instead of ignoring merge commits, generate diffs for them using the
>> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>> +	and include them in the comparison.
>> ++
>> +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>> +the context of the `range-diff` command than other formats, so choose wisely!
>
> Can we please be a bit more specific which options are usable or which
> are not usable? Perhaps you even mean to say that 'first-parent' is the
> only one that makes sense? At a minimum, we should mention that it is
> the one that makes most sense (if that is the case).

Good suggestion.

I am curious to see how well things like "--cc" and "--remerge-diff"
fare in this use case.  I suspect that in a case where the original
round forgot to make necessary semantic conflict resolutions (aka
"evil merge") that the new round has, all of them (including the
"--first-parent") would essentially show the same change, but what
is shown in such a case would be more readable with "--first-parent"
by treating a merge as if it were just a single parent commit making
a lot of changes, iow, merely one among other commits with equal
footing.

@dscho
Copy link
Member Author

dscho commented Nov 8, 2024

/submit

Copy link

gitgitgadget bot commented Nov 8, 2024

Submitted as pull.1734.v2.git.1731073383564.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1734/dscho/support-diff-merges-option-in-range-diff-v2

To fetch this version to local tag pr-1734/dscho/support-diff-merges-option-in-range-diff-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1734/dscho/support-diff-merges-option-in-range-diff-v2

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Nov 7, 2024 at 9:20 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `git log` command already offers support for including diffs for
> merges, via the `--diff-merges=<format>` option.
>
> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.
>
> In my code reviews, I found the `--diff-merges=first-parent` option
> particularly useful.

Curious.  Wouldn't --diff-merges=remerge-diff be more useful if you
are particularly interested in so-called "evil merges" and whether
they remain "evil" (i.e. empty remerge-diff) or gain additional bits
of "evilness" (i.e. more changes shown in the remerge-diff)?

first-parent would seem more like a workaround in such a case.  Let me
explain; first, let me refer to the result that you'd get after
merging with no human changes (i.e. a non-evil merge) as a
hypothetical "auto-merge" commit.  Now, --diff-merges=first-parent
could generally be broken down as the combination of diff from first
parent to auto-merge + diff from auto-merge to evil-merge (even if the
auto-merge wasn't actually recorded anywhere and is just a theoretical
construct).  Now, you aren't looking at a first-parent diff directly,
you are diffing two first-parent diffs.  In particular, you are
comparing:
    pre-rebase first-parent diff = diff from first parent of merge to
the auto-merge + diff from auto-merge to evil-merge
to
    post-rebase first-parent diff = diff from first parent of merge to
the auto-merge + diff from auto-merge to evil-merge

Assuming you didn't drop or insert or modify any commits as part of
the rebase, then the two "diff from first parent of merge to the
auto-merge" should match.  Since they match, taking the difference of
these two causes that part to cancel out, meaning you are left just
looking at the differences in the "evilness" of the actual merge.  But
if you did make other changes while rebasing, maybe dropping or
tweaking a commit, then suddenly you aren't just looking at
differences in the "evilness" of the actual merge anymore; it's mixed
with those other changes making it more challenging to review and easy
to miss the parts you are looking for.  If you want to look for
differences in whether the merge commit in question has changes other
than those that a simple "git merge" would make, remerge-diff seems
like a better choice.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Support diff merges option in range diff
>
>     The git range-diff command does the same with merge commits as git
>     rebase: It ignores them.
>
>     However, when comparing branch thickets it can be quite illuminating to
>     watch out for inadvertent changes in merge commits, in particular when
>     some "evil" merges have been replayed, i.e. merges that needed to
>     introduce changes outside of the merge conflicts (e.g. when one branch
>     changed a function's signature and another branch introduced a caller of
>     said function), in case the replayed merge is no longer "evil" and
>     therefore potentially incorrect.
>
>     Let's introduce support for the --diff-merges option that is passed
>     through to those git log commands.
>
>     I had a need for this earlier this year and got it working, leaving the
>     GitGitGadget PR in a draft mode. Phil Blain found it and kindly
>     nerd-sniped me into readying it for submitting, so say thanks to Phil!
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1734
>
>  Documentation/git-range-diff.txt | 10 +++++++++-
>  builtin/range-diff.c             | 11 +++++++++++
>  range-diff.c                     | 15 +++++++++++----
>  range-diff.h                     |  1 +
>  t/t3206-range-diff.sh            | 16 ++++++++++++++++
>  5 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index fbdbe0befeb..a964e856c3c 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
>         [--no-dual-color] [--creation-factor=<factor>]
> -       [--left-only | --right-only]
> +       [--left-only | --right-only] [--diff-merges=<format>]
>         ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
>         [[--] <path>...]
>
> @@ -81,6 +81,14 @@ to revert to color all lines according to the outer diff markers
>         Suppress commits that are missing from the second specified range
>         (or the "right range" when using the `<rev1>...<rev2>` format).
>
> +--diff-merges=<format>::
> +       Instead of ignoring merge commits, generate diffs for them using the
> +       corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +       and include them in the comparison.
> ++
> +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> +the context of the `range-diff` command than other formats, so choose wisely!
> +

Indeed.  :-)

>  --[no-]notes[=<ref>]::
>         This flag is passed to the `git log` program
>         (see linkgit:git-log[1]) that generates the patches.
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 1b33ab66a7b..e41719e0f0d 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c
> @@ -21,6 +21,7 @@ int cmd_range_diff(int argc,
>  {
>         struct diff_options diffopt = { NULL };
>         struct strvec other_arg = STRVEC_INIT;
> +       struct strvec diff_merges_arg = STRVEC_INIT;
>         struct range_diff_options range_diff_opts = {
>                 .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT,
>                 .diffopt = &diffopt,
> @@ -36,6 +37,9 @@ int cmd_range_diff(int argc,
>                 OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
>                                   N_("notes"), N_("passed to 'git log'"),
>                                   PARSE_OPT_OPTARG),
> +               OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> +                                 N_("style"), N_("passed to 'git log'"),
> +                                 PARSE_OPT_OPTARG),
>                 OPT_BOOL(0, "left-only", &left_only,
>                          N_("only emit output related to the first range")),
>                 OPT_BOOL(0, "right-only", &right_only,
> @@ -62,6 +66,12 @@ int cmd_range_diff(int argc,
>         if (!simple_color)
>                 diffopt.use_color = 1;
>
> +       /* If `--diff-merges` was specified, imply `--merges` */
> +       if (diff_merges_arg.nr) {
> +               range_diff_opts.include_merges = 1;
> +               strvec_pushv(&other_arg, diff_merges_arg.v);
> +       }
> +
>         for (i = 0; i < argc; i++)
>                 if (!strcmp(argv[i], "--")) {
>                         dash_dash = i;
> @@ -155,6 +165,7 @@ int cmd_range_diff(int argc,
>         res = show_range_diff(range1.buf, range2.buf, &range_diff_opts);
>
>         strvec_clear(&other_arg);
> +       strvec_clear(&diff_merges_arg);
>         strbuf_release(&range1);
>         strbuf_release(&range2);
>
> diff --git a/range-diff.c b/range-diff.c
> index bbb0952264b..9e59733059b 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -38,7 +38,8 @@ struct patch_util {
>   * as struct object_id (will need to be free()d).
>   */
>  static int read_patches(const char *range, struct string_list *list,
> -                       const struct strvec *other_arg)
> +                       const struct strvec *other_arg,
> +                       unsigned int include_merges)
>  {
>         struct child_process cp = CHILD_PROCESS_INIT;
>         struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT;
> @@ -49,7 +50,7 @@ static int read_patches(const char *range, struct string_list *list,
>         size_t size;
>         int ret = -1;
>
> -       strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
> +       strvec_pushl(&cp.args, "log", "--no-color", "-p",
>                      "--reverse", "--date-order", "--decorate=no",
>                      "--no-prefix", "--submodule=short",
>                      /*
> @@ -64,6 +65,8 @@ static int read_patches(const char *range, struct string_list *list,
>                      "--pretty=medium",
>                      "--show-notes-by-default",
>                      NULL);


>
> -- Hannes
>
>
> +       if (!include_merges)
> +               strvec_push(&cp.args, "--no-merges");
>         strvec_push(&cp.args, range);
>         if (other_arg)
>                 strvec_pushv(&cp.args, other_arg->v);
> @@ -96,11 +99,14 @@ static int read_patches(const char *range, struct string_list *list,
>                 }
>
>                 if (skip_prefix(line, "commit ", &p)) {
> +                       char *q;
>                         if (util) {
>                                 string_list_append(list, buf.buf)->util = util;
>                                 strbuf_reset(&buf);
>                         }
>                         CALLOC_ARRAY(util, 1);
> +                       if (include_merges && (q = strstr(p, " (from ")))
> +                               *q = '\0';
>                         if (repo_get_oid(the_repository, p, &util->oid)) {
>                                 error(_("could not parse commit '%s'"), p);
>                                 FREE_AND_NULL(util);
> @@ -571,13 +577,14 @@ int show_range_diff(const char *range1, const char *range2,
>
>         struct string_list branch1 = STRING_LIST_INIT_DUP;
>         struct string_list branch2 = STRING_LIST_INIT_DUP;
> +       unsigned int include_merges = range_diff_opts->include_merges;
>
>         if (range_diff_opts->left_only && range_diff_opts->right_only)
>                 res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only");
>
> -       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg))
> +       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range1);
> -       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg))
> +       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range2);
>
>         if (!res) {
> diff --git a/range-diff.h b/range-diff.h
> index 2f69f6a434d..cd85000b5a0 100644
> --- a/range-diff.h
> +++ b/range-diff.h
> @@ -16,6 +16,7 @@ struct range_diff_options {
>         int creation_factor;
>         unsigned dual_color:1;
>         unsigned left_only:1, right_only:1;
> +       unsigned include_merges:1;
>         const struct diff_options *diffopt; /* may be NULL */
>         const struct strvec *other_arg; /* may be NULL */
>  };
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 86010931ab6..c18a3fdab83 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule'
>         test_cmp expect actual
>  '
>
> +test_expect_success '--diff-merges' '
> +       renamed_oid=$(git rev-parse --short renamed-file) &&
> +       tree=$(git merge-tree unmodified renamed-file) &&
> +       clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
> +       clean_oid=$(git rev-parse --short $clean) &&
> +       conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
> +       conflict_oid=$(git rev-parse --short $conflict) &&
> +
> +       git range-diff --diff-merges=1 $clean...$conflict >actual &&
> +       cat >expect <<-EOF &&
> +       1:  $renamed_oid < -:  ------- s/12/B/
> +       2:  $clean_oid = 1:  $conflict_oid merge
> +       EOF
> +       test_cmp expect actual
> +'
> +
>  test_done
>
> base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
> --
> gitgitgadget

Didn't spot any problems with the patch itself.

Copy link

gitgitgadget bot commented Nov 8, 2024

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Nov 7, 2024 at 10:54 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
> > +--diff-merges=<format>::
> > +     Instead of ignoring merge commits, generate diffs for them using the
> > +     corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +     and include them in the comparison.
> > ++
> > +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> > +the context of the `range-diff` command than other formats, so choose wisely!
>
> Can we please be a bit more specific which options are usable or which
> are not usable? Perhaps you even mean to say that 'first-parent' is the
> only one that makes sense? At a minimum, we should mention that it is
> the one that makes most sense (if that is the case).

Here's my guesses I mostly typed up last night before getting tired
and going to bed:

separate: makes no sense; showing two diffs for each merge and then
trying to diff the two sets of two diffs seems crazy, especially since
the whole point of rebasing is usually to change the first parent
somewhere in the ancestry; that'll cascade up and cause the diffs to
the second parents to differ wildly and make this format uselessly
noisy in general.

combined: much better than separate, but probably still quite noisy and
confusing, especially since combined-diff and range-diff are both
complicated in that they have two diffs each, and combining the two
means you have a diff of a dual diff.  I suspect that might be hard for users to
process.

dense-combined: much better than combined, you at least have the diff
down to the smallest relevant pieces.  You may still deal with the
diff of a dual diff problem.

first-parent: useful, but only when keeping the series of patches largely
intact and transplanting them; any insertion, deletion, or
modification of intermediate patches will result in those
modifications being shown
not just for the patch in question but rolled up into the merge commit
and shown there as well, making the diff for the merge very noisy
(possibly to the point of uselessness).  When
keeping the series of patches intact and transplanting them, though,
likely a useful mode.

remerge-diff: generally useful, even if you insert, delete, or modify
intermediate patches while rebasing.  Since it also shows how conflict
resolutions are done, you'd see how conflicts are resolved
differently.  Actually, that might be one drawback to this method;
since conflict markers try to provide labels involving a short commit
ID and the commit message, and those short commit IDs will differ,
this mode would just end up showing how the two merges had different
parents, which you already know.  We could probably make this option
much more useful by surpressing the conflict labels, or at least the
short commit ID parts of those labels.  If we do that, and take
Johannes' nice fix he sent to the list separately, then I suspect this
mode is generally the most useful one to use with range-diff.

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `git log` command already offers support for including diffs for
> merges, via the `--diff-merges=<format>` option.
>
> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.
>
> In my code reviews, I found the `--diff-merges=first-parent` option
> particularly useful.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Support diff merges option in range diff
>
>     The git range-diff command does the same with merge commits as git
>     rebase: It ignores them.
>
>     However, when comparing branch thickets it can be quite illuminating to
>     watch out for inadvertent changes in merge commits, in particular when
>     some "evil" merges have been replayed, i.e. merges that needed to
>     introduce changes outside of the merge conflicts (e.g. when one branch
>     changed a function's signature and another branch introduced a caller of
>     said function), in case the replayed merge is no longer "evil" and
>     therefore potentially incorrect.
>
>     Let's introduce support for the --diff-merges option that is passed
>     through to those git log commands.
>
>     I had a need for this earlier this year and got it working, leaving the
>     GitGitGadget PR in a draft mode. Phil Blain found it and kindly
>     nerd-sniped me into readying it for submitting, so say thanks to Phil!
>
>     Changes since v1:
>
>      * Changed the documentation to recommend first-parent mode instead of
>        vaguely talking about various modes' merits.
>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1734
>
> Range-diff vs v1:
>
>  1:  11361e07af8 ! 1:  d01a395900b range-diff: optionally include merge commits' diffs in the analysis
>      @@ Documentation/git-range-diff.txt: to revert to color all lines according to the
>       + corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>       + and include them in the comparison.
>       ++
>      -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>      -+the context of the `range-diff` command than other formats, so choose wisely!
>      ++Note: In the common case, the `first-parent` mode will be the most natural one
>      ++to use, as it is consistent with the idea that a merge is kind of a "meta
>      ++patch", comprising all the merged commits' patches into a single one.
>       +
>        --[no-]notes[=<ref>]::
>         This flag is passed to the `git log` program
>      @@ builtin/range-diff.c: int cmd_range_diff(int argc,
>                                   N_("notes"), N_("passed to 'git log'"),
>                                   PARSE_OPT_OPTARG),
>       +         OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
>      -+                           N_("style"), N_("passed to 'git log'"),
>      -+                           PARSE_OPT_OPTARG),
>      ++                           N_("style"), N_("passed to 'git log'"), 0),
>                 OPT_BOOL(0, "left-only", &left_only,
>                          N_("only emit output related to the first range")),
>                 OPT_BOOL(0, "right-only", &right_only,
>
>
>  Documentation/git-range-diff.txt | 11 ++++++++++-
>  builtin/range-diff.c             | 10 ++++++++++
>  range-diff.c                     | 15 +++++++++++----
>  range-diff.h                     |  1 +
>  t/t3206-range-diff.sh            | 16 ++++++++++++++++
>  5 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index fbdbe0befeb..17a85957877 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
>         [--no-dual-color] [--creation-factor=<factor>]
> -       [--left-only | --right-only]
> +       [--left-only | --right-only] [--diff-merges=<format>]
>         ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
>         [[--] <path>...]
>
> @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
>         Suppress commits that are missing from the second specified range
>         (or the "right range" when using the `<rev1>...<rev2>` format).
>
> +--diff-merges=<format>::
> +       Instead of ignoring merge commits, generate diffs for them using the
> +       corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +       and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use,

I think we need more wording around "common case"; I believe this
"common case" is when you are diffing against a merely transplanted
series of commits (a series created by rebasing without inserting,
removing, or minimally modifying those commits in the process) that
`first-parent` makes sense.  And it only makes sense in that case.

I think `remerge-diff` generally makes sense here, both in the cases
when `first-parent` makes sense and when `first-parent` does not.  It
could be improved by suppressing the inclusion of short commit ids
(and maybe also commit messages) in the labels of conflict markers.  I
suspect that issue might make `remerge-diff` less useful than
`first-parent` in simple common cases currently, but I think it's the
right thing to build upon for what you are trying to view.

If `remerge-diff` didn't exist, I think I'd always use
`dense-combined` over `first-parent` because of this
merely-transplanted limitation.  I suspect dense-combined would
probably be kind of ugly and hard to parse when there is an actual
evil merge, but it'd at least limit what it shows to the evil merge
content.

> as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

Doesn't this wording of yours naturally lead to the idea that
`first-parent` is a bad choice if patches leading to the merge have
been inserted, removed, or more than minimally modified?  It points
out that first-parent is a diff over the whole range, and so
range-diff shows you a diff of the diffs over the whole range.  If you
want to look at the "evilness" of merge commits, i.e. the
user-generated portion of the diffs included in by merge commits, you
need either remerge-diff or dense-combined.

> +
>  --[no-]notes[=<ref>]::
>         This flag is passed to the `git log` program
>         (see linkgit:git-log[1]) that generates the patches.
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 1b33ab66a7b..901de5d133d 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c
> @@ -21,6 +21,7 @@ int cmd_range_diff(int argc,
>  {
>         struct diff_options diffopt = { NULL };
>         struct strvec other_arg = STRVEC_INIT;
> +       struct strvec diff_merges_arg = STRVEC_INIT;
>         struct range_diff_options range_diff_opts = {
>                 .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT,
>                 .diffopt = &diffopt,
> @@ -36,6 +37,8 @@ int cmd_range_diff(int argc,
>                 OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
>                                   N_("notes"), N_("passed to 'git log'"),
>                                   PARSE_OPT_OPTARG),
> +               OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> +                                 N_("style"), N_("passed to 'git log'"), 0),
>                 OPT_BOOL(0, "left-only", &left_only,
>                          N_("only emit output related to the first range")),
>                 OPT_BOOL(0, "right-only", &right_only,
> @@ -62,6 +65,12 @@ int cmd_range_diff(int argc,
>         if (!simple_color)
>                 diffopt.use_color = 1;
>
> +       /* If `--diff-merges` was specified, imply `--merges` */
> +       if (diff_merges_arg.nr) {
> +               range_diff_opts.include_merges = 1;
> +               strvec_pushv(&other_arg, diff_merges_arg.v);
> +       }
> +
>         for (i = 0; i < argc; i++)
>                 if (!strcmp(argv[i], "--")) {
>                         dash_dash = i;
> @@ -155,6 +164,7 @@ int cmd_range_diff(int argc,
>         res = show_range_diff(range1.buf, range2.buf, &range_diff_opts);
>
>         strvec_clear(&other_arg);
> +       strvec_clear(&diff_merges_arg);
>         strbuf_release(&range1);
>         strbuf_release(&range2);
>
> diff --git a/range-diff.c b/range-diff.c
> index bbb0952264b..9e59733059b 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -38,7 +38,8 @@ struct patch_util {
>   * as struct object_id (will need to be free()d).
>   */
>  static int read_patches(const char *range, struct string_list *list,
> -                       const struct strvec *other_arg)
> +                       const struct strvec *other_arg,
> +                       unsigned int include_merges)
>  {
>         struct child_process cp = CHILD_PROCESS_INIT;
>         struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT;
> @@ -49,7 +50,7 @@ static int read_patches(const char *range, struct string_list *list,
>         size_t size;
>         int ret = -1;
>
> -       strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
> +       strvec_pushl(&cp.args, "log", "--no-color", "-p",
>                      "--reverse", "--date-order", "--decorate=no",
>                      "--no-prefix", "--submodule=short",
>                      /*
> @@ -64,6 +65,8 @@ static int read_patches(const char *range, struct string_list *list,
>                      "--pretty=medium",
>                      "--show-notes-by-default",
>                      NULL);
> +       if (!include_merges)
> +               strvec_push(&cp.args, "--no-merges");
>         strvec_push(&cp.args, range);
>         if (other_arg)
>                 strvec_pushv(&cp.args, other_arg->v);
> @@ -96,11 +99,14 @@ static int read_patches(const char *range, struct string_list *list,
>                 }
>
>                 if (skip_prefix(line, "commit ", &p)) {
> +                       char *q;
>                         if (util) {
>                                 string_list_append(list, buf.buf)->util = util;
>                                 strbuf_reset(&buf);
>                         }
>                         CALLOC_ARRAY(util, 1);
> +                       if (include_merges && (q = strstr(p, " (from ")))
> +                               *q = '\0';
>                         if (repo_get_oid(the_repository, p, &util->oid)) {
>                                 error(_("could not parse commit '%s'"), p);
>                                 FREE_AND_NULL(util);
> @@ -571,13 +577,14 @@ int show_range_diff(const char *range1, const char *range2,
>
>         struct string_list branch1 = STRING_LIST_INIT_DUP;
>         struct string_list branch2 = STRING_LIST_INIT_DUP;
> +       unsigned int include_merges = range_diff_opts->include_merges;
>
>         if (range_diff_opts->left_only && range_diff_opts->right_only)
>                 res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only");
>
> -       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg))
> +       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range1);
> -       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg))
> +       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range2);
>
>         if (!res) {
> diff --git a/range-diff.h b/range-diff.h
> index 2f69f6a434d..cd85000b5a0 100644
> --- a/range-diff.h
> +++ b/range-diff.h
> @@ -16,6 +16,7 @@ struct range_diff_options {
>         int creation_factor;
>         unsigned dual_color:1;
>         unsigned left_only:1, right_only:1;
> +       unsigned include_merges:1;
>         const struct diff_options *diffopt; /* may be NULL */
>         const struct strvec *other_arg; /* may be NULL */
>  };
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 86010931ab6..c18a3fdab83 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule'
>         test_cmp expect actual
>  '
>
> +test_expect_success '--diff-merges' '
> +       renamed_oid=$(git rev-parse --short renamed-file) &&
> +       tree=$(git merge-tree unmodified renamed-file) &&
> +       clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
> +       clean_oid=$(git rev-parse --short $clean) &&
> +       conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
> +       conflict_oid=$(git rev-parse --short $conflict) &&
> +
> +       git range-diff --diff-merges=1 $clean...$conflict >actual &&
> +       cat >expect <<-EOF &&
> +       1:  $renamed_oid < -:  ------- s/12/B/
> +       2:  $clean_oid = 1:  $conflict_oid merge
> +       EOF
> +       test_cmp expect actual
> +'
> +
>  test_done
>
> base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
> --
> gitgitgadget

Copy link

gitgitgadget bot commented Nov 9, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

One more thing...

On Fri, Nov 8, 2024 at 3:04 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
[...]
> Footnote *1*: We should really find a much better name for this than "evil
> merge". There is nothing evil about me having to add `#include
> "environment.h"` in v2.45.1^, for example. It was necessary so that the
> code would build. Tedious, yes, but not evil.

Indeed, I've felt it's problematic for a while too and had to take
time on at least one occasion to mention to someone else that I don't
actually mean "evil" it's just that "evil merge" as a compound term is
the convenient nomenclature we've been using for all of these,
regardless of whether the user modified the merge to resolve syntactic
conflicts, or to resolve semantic conflicts, or to sneak in "evil"
changes.  That's particularly odd since the first category is the most
common, and the third (snuck in unrelated changes or "evil changes")
are the most rare.  Maybe we should just call these "user-modified
merges" rather than "evil merges"?  Any better suggestions?

Copy link

gitgitgadget bot commented Nov 10, 2024

On the Git mailing list, Philippe Blain wrote (reply to this):

Hi Dscho,

Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit :
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> The `git log` command already offers support for including diffs for
> merges, via the `--diff-merges=<format>` option.
> 
> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.

Maybe "between commit ranges that include merge commits" would be more
workflow-agnostic ? Just thinking out loud here, I think what you wrote 
is also OK.

> In my code reviews, I found the `--diff-merges=first-parent` option
> particularly useful.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Support diff merges option in range diff
>     
>     The git range-diff command does the same with merge commits as git
>     rebase: It ignores them.
>     
>     However, when comparing branch thickets it can be quite illuminating to
>     watch out for inadvertent changes in merge commits, in particular when
>     some "evil" merges have been replayed, i.e. merges that needed to
>     introduce changes outside of the merge conflicts (e.g. when one branch
>     changed a function's signature and another branch introduced a caller of
>     said function), in case the replayed merge is no longer "evil" and
>     therefore potentially incorrect.
>     
>     Let's introduce support for the --diff-merges option that is passed
>     through to those git log commands.
>     
>     I had a need for this earlier this year and got it working, leaving the
>     GitGitGadget PR in a draft mode. Phil Blain found it and kindly
>     nerd-sniped me into readying it for submitting, so say thanks to Phil!
>     
>     Changes since v1:
>     
>      * Changed the documentation to recommend first-parent mode instead of
>        vaguely talking about various modes' merits.
>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1734
> 
> Range-diff vs v1:
> 
>  1:  11361e07af8 ! 1:  d01a395900b range-diff: optionally include merge commits' diffs in the analysis
>      @@ Documentation/git-range-diff.txt: to revert to color all lines according to the
>       +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>       +	and include them in the comparison.
>       ++
>      -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>      -+the context of the `range-diff` command than other formats, so choose wisely!
>      ++Note: In the common case, the `first-parent` mode will be the most natural one
>      ++to use, as it is consistent with the idea that a merge is kind of a "meta
>      ++patch", comprising all the merged commits' patches into a single one.
>       +
>        --[no-]notes[=<ref>]::
>        	This flag is passed to the `git log` program
>      @@ builtin/range-diff.c: int cmd_range_diff(int argc,
>        				  N_("notes"), N_("passed to 'git log'"),
>        				  PARSE_OPT_OPTARG),
>       +		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
>      -+				  N_("style"), N_("passed to 'git log'"),
>      -+				  PARSE_OPT_OPTARG),
>      ++				  N_("style"), N_("passed to 'git log'"), 0),
>        		OPT_BOOL(0, "left-only", &left_only,
>        			 N_("only emit output related to the first range")),
>        		OPT_BOOL(0, "right-only", &right_only,
> 
> 
>  Documentation/git-range-diff.txt | 11 ++++++++++-
>  builtin/range-diff.c             | 10 ++++++++++
>  range-diff.c                     | 15 +++++++++++----
>  range-diff.h                     |  1 +
>  t/t3206-range-diff.sh            | 16 ++++++++++++++++
>  5 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index fbdbe0befeb..17a85957877 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
>  	[--no-dual-color] [--creation-factor=<factor>]
> -	[--left-only | --right-only]
> +	[--left-only | --right-only] [--diff-merges=<format>]
>  	( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
>  	[[--] <path>...]
>  
> @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
>  	Suppress commits that are missing from the second specified range
>  	(or the "right range" when using the `<rev1>...<rev2>` format).
>  
> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use, as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

I think I agree with Elijah that we probably should also highlight at least
'remerge'.

Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are 
a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s.

> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 1b33ab66a7b..901de5d133d 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c

The changes look good to me. Maybe it would be nice to add a corresponding
'range-diff.diffMerges' config option to allow users to configure the 
behaviour more permanently ?

> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 86010931ab6..c18a3fdab83 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule'
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--diff-merges' '
> +	renamed_oid=$(git rev-parse --short renamed-file) &&
> +	tree=$(git merge-tree unmodified renamed-file) &&
> +	clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
> +	clean_oid=$(git rev-parse --short $clean) &&
> +	conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
> +	conflict_oid=$(git rev-parse --short $conflict) &&
> +
> +	git range-diff --diff-merges=1 $clean...$conflict >actual &&
> +	cat >expect <<-EOF &&
> +	1:  $renamed_oid < -:  ------- s/12/B/
> +	2:  $clean_oid = 1:  $conflict_oid merge
> +	EOF
> +	test_cmp expect actual
> +'

The test looks good. 

Thanks for submitting that patch!

Philippe.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

> That's particularly odd since the first category is the most
> common, and the third (snuck in unrelated changes or "evil changes")
> are the most rare.  Maybe we should just call these "user-modified
> merges" rather than "evil merges"?  Any better suggestions?

It is "evil" in the sense that it makes it really tough for those
who dig history to find out what happened later to figure out where
the change came from.  It could be an attempt to hide a malicious
and unrelated change, it could be a resolution for a non-textual
conflict, or it could be a "while at it I am fixing an unrelated
typo I spotted nearby while resolving this textual conflict".

The change does not have to have an evil intent.  It is just
something different from and more than what the textual and
mechanical merge would produce.

"more than" may give us a hint for what we want to convey in the
name, if we were to pick a new name.  If a new topic changed a
function signature to add one extra parameter, and updated all the
callers of the function that existed back when the topic forked, an
"evil" merge needs to address the semantic conflicts to do a very
similar update to a new call to the function added on the mainline
since the topic forked, when the topic is merged back to the
mainline.  Taken as a whole, the first-parent diff of such a merge
will show that a function gained a parameter, and all of its callers
have been adjusted for that change.  Most of such changes would have
come from the side branch, but on top of that, the merge needed to
contribute more of the similar changes to callers.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).

Yup, I was happy to see it in the range-diff output, because I was
wondering where the optarg came from in the initial submission.

> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use, as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

I'll let you and Elijah figure out the best phrasing around here,
including whether we recommend only first-parent or give other
options.

For me personally, I find that use of `first-parent` here is
consistent with the mindset of those users who prefer to read "git
log --first-parent -p", and to a smaller degree, those who use
squash merges.  To them, a merge is merely bringing in lots of
changes into the mainline as a single unit, and other than that
these lots of changes are broken out for finer grained inspection
if/when they wanted to, they prefer to treat the changes brought in
by merges just like a single-parent commit.  And from that point of
view, (1) reordering the changes inside the side branch is
immaterial for the purpose of talking about the result, the net
effect the merged topic makes to the mainline, and (2) dropping or
adding new changes to the side branch is treated just like a newer
iteration of a single parent commit having fewer or more changes.
So it is very natural that first-parent helps to make the world view
very simplistic, and more readable range-diff is all about how you
can present the two iterations in a simple and flattened form.

There _may_ need a tweak of the matching algorithm to allow the
"same" merge on both sides to match, even if they diverge a lot,
though.  A range-diff that pairs a merge in the previous iteration
with a single patch in the updated iteration may be hard to read.

Copy link

gitgitgadget bot commented Nov 11, 2024

This patch series was integrated into seen via git@8041621.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Nov 10, 2024 at 4:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
[...]
> > +--diff-merges=<format>::
> > +     Instead of ignoring merge commits, generate diffs for them using the
> > +     corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +     and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use, as it is consistent with the idea that a merge is kind of a "meta
> > +patch", comprising all the merged commits' patches into a single one.
>
> I'll let you and Elijah figure out the best phrasing around here,
> including whether we recommend only first-parent or give other
> options.

I'd probably suggest:

Note: When trying to see how users adjusted the contents of a merge
commit, `remerge-diff` is a natural choice.  When viewing merges as a
"meta patch", comprising a squash of all the merged commits' patches
into a single one (similar to `git log --first-parent -p`),
`first-parent` is a natural choice.

> For me personally, I find that use of `first-parent` here is
> consistent with the mindset of those users who prefer to read "git
> log --first-parent -p", and to a smaller degree, those who use
> squash merges.  To them, a merge is merely bringing in lots of
> changes into the mainline as a single unit, and other than that
> these lots of changes are broken out for finer grained inspection
> if/when they wanted to, they prefer to treat the changes brought in
> by merges just like a single-parent commit.  And from that point of
> view, (1) reordering the changes inside the side branch is
> immaterial for the purpose of talking about the result, the net
> effect the merged topic makes to the mainline, and (2) dropping or
> adding new changes to the side branch is treated just like a newer
> iteration of a single parent commit having fewer or more changes.
> So it is very natural that first-parent helps to make the world view
> very simplistic, and more readable range-diff is all about how you
> can present the two iterations in a simple and flattened form.
>
> There _may_ need a tweak of the matching algorithm to allow the
> "same" merge on both sides to match, even if they diverge a lot,
> though.  A range-diff that pairs a merge in the previous iteration
> with a single patch in the updated iteration may be hard to read.

Sounds like you are arguing that there is an angle/usecase from which
`first-parent` makes sense, namely the viewpoint that "a merge is
merely bringing in lots of changes into the mainline as a single unit"
as you put it.  What was surprising to me, though, is that it's a
completely different usecase than the one that was brought up in the
commit message for this feature, namely "so-called 'evil merges' are
sometimes necessary and need to be reviewed too".

If you want to review "evilness" or the differences between the
changes that `git merge` would make and what got recorded in the
commit, then `first-parent` is not that great an approximation of what
you are looking for, BUT it happens to work beautifully in common
cases where you are merely transplanting commits due to the fact that
the huge amounts of extras being put into the diff happen to be equal
on the two sides, so they cancel out and leave you with the "evilness"
you are interested in.  But if you stray from that common case (by
e.g. inserting, removing, or modifying commits while rebasing) and are
still interested in "evilness", your approximation quickly drops from
accurately pulling out just the bits you are interested in towards
being completely swamped by the noise.

My concern with the commit message and patch as they stand, is that I
feel the `first-parent` mode is being recommended as a solution for a
certain usecase where it is really only an approximation that luckily
is highly accurate in special but common scenarios while being very
inaccurate in others.  I think that will lead to users getting
negatively surprised when they move outside those common cases.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 8 Nov 2024, Elijah Newren wrote:

> On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> > index fbdbe0befeb..17a85957877 100644
> > --- a/Documentation/git-range-diff.txt
> > +++ b/Documentation/git-range-diff.txt
> > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
> >         Suppress commits that are missing from the second specified range
> >         (or the "right range" when using the `<rev1>...<rev2>` format).
> >
> > +--diff-merges=<format>::
> > +       Instead of ignoring merge commits, generate diffs for them using the
> > +       corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +       and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use,
>
> I think we need more wording around "common case"; I believe this
> "common case" is when you are diffing against a merely transplanted
> series of commits (a series created by rebasing without inserting,
> removing, or minimally modifying those commits in the process) that
> `first-parent` makes sense.  And it only makes sense in that case.

I tried to reproduce some of the common cases (which were more along the
lines of stacked PRs than transplanted series of commits), and you're
right: with the remerge bug fix, the range-diffs are much more easily
interpreted.

For example, I had to squash in a few fixes in
https://github.com/git/git-scm.com/pull/1915, and I try to combine all of
my currently-open git-scm.com PRs into the `gh-pages` branch in my fork.

With `--diff-merges=1`, all of those single-commit merges that update the
various translations of the book had diffs identical to the diffs of the
commits they merged. Which confused range-diff quite a bit, often picking
a non-merge as matching a merge (which is of course suboptimal).

With `--diff-merges=remerge`, these trivial merges had no diffs at all,
which made the review much easier.

> I think `remerge-diff` generally makes sense here, both in the cases
> when `first-parent` makes sense and when `first-parent` does not.  It
> could be improved by suppressing the inclusion of short commit ids
> (and maybe also commit messages) in the labels of conflict markers.  I
> suspect that issue might make `remerge-diff` less useful than
> `first-parent` in simple common cases currently, but I think it's the
> right thing to build upon for what you are trying to view.

Interesting idea about the suppressed commit IDs. I'll play with that.

Ciao,
Johannes

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Philippe,

On Sun, 10 Nov 2024, Philippe Blain wrote:

> Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit :
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > The `git log` command already offers support for including diffs for
> > merges, via the `--diff-merges=<format>` option.
> >
> > Let's add corresponding support for `git range-diff`, too. This makes it
> > more convenient to spot differences between iterations of non-linear
> > contributions, where so-called "evil merges" are sometimes necessary and
> > need to be reviewed, too.
>
> Maybe "between commit ranges that include merge commits" would be more
> workflow-agnostic ?

Good idea, this is much clearer than what I wrote, too.

> > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> > index fbdbe0befeb..17a85957877 100644
> > --- a/Documentation/git-range-diff.txt
> > +++ b/Documentation/git-range-diff.txt
> > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
> >  	Suppress commits that are missing from the second specified range
> >  	(or the "right range" when using the `<rev1>...<rev2>` format).
> >
> > +--diff-merges=<format>::
> > +	Instead of ignoring merge commits, generate diffs for them using the
> > +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +	and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use, as it is consistent with the idea that a merge is kind of a "meta
> > +patch", comprising all the merged commits' patches into a single one.
>
> I think I agree with Elijah that we probably should also highlight at least
> 'remerge'.
>
> Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are
> a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s.

Right, I did not want to deviate too much from the surrounding style.

Besides, I am not _so_ versed in AsciiDoc, I consider Markdown to be much
more common, so I am much more familiar with that. I had no idea that
there was a proper AsciiDoc [NOTE].

> > diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> > index 1b33ab66a7b..901de5d133d 100644
> > --- a/builtin/range-diff.c
> > +++ b/builtin/range-diff.c
>
> The changes look good to me. Maybe it would be nice to add a corresponding
> 'range-diff.diffMerges' config option to allow users to configure the
> behaviour more permanently ?

Seeing as there are no existing `rangeDiff.*` options, I am loathe to
introduce the first one lest I am asked why I don't balloon this patch
series into introducing config settings for the other options, too.

Ciao,
Johannes

Copy link

gitgitgadget bot commented Nov 12, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

>> There _may_ need a tweak of the matching algorithm to allow the
>> "same" merge on both sides to match, even if they diverge a lot,
>> though.  A range-diff that pairs a merge in the previous iteration
>> with a single patch in the updated iteration may be hard to read.
>
> Sounds like you are arguing that there is an angle/usecase from which
> `first-parent` makes sense, namely the viewpoint that "a merge is
> merely bringing in lots of changes into the mainline as a single unit"
> as you put it.  What was surprising to me, though, is that it's a
> completely different usecase than the one that was brought up in the
> commit message for this feature, namely "so-called 'evil merges' are
> sometimes necessary and need to be reviewed too".

What I had in mind when I wrote the example you are responding to is
based on what sometimes happens while I make repeated merges (and as
you may know, I make lots of them).  In the first attempt I miss the
fact that I need semantic adjustments and then in the second attempt
I know what additional changes are necessary in the same merge (i.e.
merging exactly the same iteration of the same topic).  If you run
the first-parent range-diff between one iteration of 'seen' and
another, the "additional changes" I would make in the second attempt
would be the only thing that will appear in the output, showing the
"evil merge".

There can also be updates in the topic itself when I rebuild 'seen',
in addition to merge-fixes to adjust for semantic conflicts.  Such a
change would also appear in the first-parent view.  If you used
other views, like dense-combined or remerge-diff, these updates in
the topic itself may be hidden, as these other views are
specifically designed to highlight conflict resolutions and evil
merges by discarding mechanically resolvable changes.  So it all
depends on what you are looking for and what you are deliberately
excluding from the lower level of diff generation when you prepare
your range-diff, which is a "diff of diff".  Giving an impression
that first-parent is the most useful may risk misleading readers in
that sense.

Copy link

gitgitgadget bot commented Nov 12, 2024

This patch series was integrated into seen via git@3dc534b.

Copy link

gitgitgadget bot commented Nov 13, 2024

This patch series was integrated into seen via git@75fb59b.

Copy link

gitgitgadget bot commented Nov 15, 2024

This patch series was integrated into seen via git@e1a3971.

Copy link

gitgitgadget bot commented Nov 16, 2024

This patch series was integrated into seen via git@4a35789.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants