-
Notifications
You must be signed in to change notification settings - Fork 869
conflict: encode ending EOL conflicts in the materialized conflict #8441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a22b4b9 to
aec3562
Compare
aec3562 to
557ef23
Compare
scott2000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parsing logic looks good to me, but the materialization logic seems a bit complex.
..., so that the double negation with `!has_no_eol(...)` can be avoided to improve the readability.
557ef23 to
99c1174
Compare
| %%%%%%% diff from base to side #2 | ||
| +line 2 | ||
| >>>>>>> conflict 1 of 1 ends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test including last hunk with blank lines? I think it would be interesting to see how that would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure about the exact case you want to test, so I list several cases and the materialized results, all of them can be parsed back to the original 3-way merge as is, LF is denoted as $:
- base:
aa, left:aa\n\n\n, right:aa\n. The conflict doesn't have the ending EOL.
<<<<<<< conflict 1 of 1$
%%%%%%% diff from base to side #1$
aa$
+$
+++++++ side #2$
aa$
$
$
$
>>>>>>> conflict 1 of 1 ends
- base:
aa\n, left:aa\nbb\n\n\n, right:aa\nbb. The conflict doesn't have the ending EOL.
aa$
<<<<<<< conflict 1 of 1$
%%%%%%% diff from base to side #1$
-$
+bb$
+++++++ side #2$
bb$
$
$
$
>>>>>>> conflict 1 of 1 ends
- base:
aa\n, left:aa\nbb, right:aa\nbb\n\n\n. The conflict doesn't have the ending EOL.
aa$
<<<<<<< conflict 1 of 1$
+++++++ side #1$
bb$
$
$
$
%%%%%%% diff from base to side #2$
-$
+bb$
>>>>>>> conflict 1 of 1 ends
- base:
aa\n\n\n, left:aa, right:aa\n. The conflict doesn't have the ending EOL.
<<<<<<< conflict 1 of 1$
%%%%%%% diff from base to side #1$
aa$
$
-$
-$
+++++++ side #2$
aa$
>>>>>>> conflict 1 of 1 ends
- base:
aa\n\n\n, right:aa\n, left:aa. The conflict doesn't have the ending EOL.
<<<<<<< conflict 1 of 1$
+++++++ side #1$
aa$
%%%%%%% diff from base to side #2$
aa$
$
-$
-$
>>>>>>> conflict 1 of 1 ends
Let me know the precise cases you want to add to tests, and I will add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something like "all sides end with EOL" and one side has blank line at the end of file. I suspect that it would look very similar to "one side lacks newline".
This might cause trouble because some editors automatically append newline when saving. I don't know whether it would be annoying if blank lines or newlines were unexpectedly added, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking something like base aa\n, side1 aa\ncc\n\n, side2 aa\nbb\n?
This PR doesn't touch that code path, because it doesn't have an ending EOL conflict, since all sides end with EOL, and it will be materialized to
aa␊
<<<<<<< side #1␊
cc␊
␊
||||||| base␊
=======␊
bb␊
>>>>>>> side #2␊And it can be correctly parsed back to the following Merge<String>:
Conflicted(
[
"aa\ncc\n\n",
"aa\n",
"aa\nbb\n",
],
)
If you are talking about editor adding an ending EOL, when the materialized conflict doesn't have the ending EOL. The parsed conflict will add an extra new line character to every side. For example:
- base:
aa\n - side1:
aa\ncc\n\n - side2:
aa\nbb - The materialized conflict:
aa␊ <<<<<<< side #1␊ cc␊ ␊ ␊ ||||||| base␊ ␊ =======␊ bb␊ >>>>>>> side #2
- And if the editor adds an ending EOL to the conflict, the parsed
Merge<String>is:Conflicted( [ "aa\ncc\n\n\n", "aa\n\n", "aa\nbb\n", ], )
If that's not what you meant, can you give a concrete example and the expectation? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking something like base
aa\n, side1aa\ncc\n\n, side2aa\nbb\n?
Yes, thanks. I'm asking something like that because it will help discuss UX of new EOL conflict format.
If you are talking about editor adding an ending EOL, when the materialized conflict doesn't have the ending EOL. The parsed conflict will add an extra new line character to every side.
Yes, that could be a bit annoying, and old format wouldn't have this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that could be a bit annoying, and old format wouldn't have this problem.
True. But the old format is not perfect as well:
- It can generate unexpected "(removes terminating newline)" comment. See FR: Represent more conflict state within the materialized file #5496 (comment).
- The old implementation doesn't allow modification to the ending EOL of different sides(IMO, this reason is weak, because this has little actual UX impact. Maybe it improves
jj fix, because now the fix tool can modify the ending EOL).
So it's a trade off, and I think we are Ok to be less considerate with the editors that always adds the ending EOL, because:
- that editor can't edit the ending EOL of any files anyway, so it's Ok that we don't support such editor to resolve an ending EOL conflict. Nonetheless, the user can still use this editor to change all sides of conflicts to have the ending EOL.
- if the editor is used univsersally in this repo, it's unlikely that an ending EOL conflict could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that user can edit unrelated part of the conflicted file without noticing that the last conflict marker lacks newline. I agree it's a trade off. I just assume that it would be rare to fix missing terminal newline without resolving conflicts.
if the editor is used univsersally in this repo, it's unlikely that an ending EOL conflict could happen.
Pulled changes could have mixed ending EOL states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that user can edit unrelated part of the conflicted file without noticing that the last conflict marker lacks newline.
Then what about the following ideas:
- Add explicit message in the conflict to notify the user that they encounter an ending EOL conflict.
- Add a warning message in CLI when an ending EOL conflict is created, e.g. when the user executes
jj new xxxx yyyy.
I also don't think it's too bad if the user doesn't notice that the last conflict marker lacks newline. When the user manually resolve the conflict, they have to choose whether the file ends with newline or not, because this file ends with a conflict hunk.
I just assume that it would be rare to fix missing terminal newline without resolving conflicts.
If we only focus on the experience on resolving the terminal newline conflict, with this PR or not, it has little difference. The user just sees the modification from different sides in the last hunk, and decide whether the resolved file should be ended with an EOL manually. And I would argue, if we add back the old "(removes terminating newline)" label, the UX has almost no difference.
-
old:
aa␊ <<<<<<< conflict 1 of 1␊ %%%%%%% diff from: base␊ \\\\\\\ to: side #1 (no terminating newline)␊ +bb␊ +++++++ side #2␊ cc␊ >>>>>>> conflict 1 of 1 ends␊ -
with this PR and if we add the conflict comment back
aa␊ <<<<<<< conflict 1 of 1␊ +++++++ side #1 (no terminating newline)␊ bb␊ %%%%%%% diff from base to side #2␊ +cc␊ ␊ >>>>>>> conflict 1 of 1 ends
The biggest motivation of this PR is that, once we add gitattributes filter support, and when snapshotting a conflict, we want to convert the contents of different side before check-in, the current implementation makes it difficult to decide the whether the original side of the contents have the terminal EOL: you have to read back from the tree, apply the smudge filter, before checking whether the original contents have the EOL. Please see #8266 (comment) for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just assume that it would be rare to fix missing terminal newline without resolving conflicts.
If we only focus on the experience on resolving the terminal newline conflict, with this PR or not, it has little difference. The user just sees the modification from different sides in the last hunk, and decide whether the resolved file should be ended with an EOL manually. And I would argue, if we add back the old "(removes terminating newline)" label, the UX has almost no difference.
Adding back "terminating newline" comments would definitely help, but my point is that it seems unusual that fixing up missing terminal newline actually adds blank line (to some sides of the last hunk.)
Oh, btw, we need to add format!("{data}[EOF]\n") or something to insta snapshots when terminal newline matters.
Add a warning message in CLI when an ending EOL conflict is created,
I don't think CLI warning is needed. It would be rare that ending EOL conflicts.
The biggest motivation of this PR is that, once we add
gitattributesfiltersupport, and when snapshotting a conflict, we want to convert the contents of different side before check-in, the current implementation makes it difficult to decide the whether the original side of the contents have the terminal EOL: you have to read back from the tree, apply thesmudgefilter, before checking whether the original contents have the EOL. Please see #8266 (comment) for details.
Yeah, filter support is an interesting problem. Suppose internal (auto) merges won't apply filter, the checked-out content will have to have conflicts even if conflicts in filtered content could be resolved automatically. I have no idea about how filters can be processed, but I'm okay with this PR if it helps. As I said, ending EOL conflicts would be pretty rare, so we shouldn't care too much about the UX.
99c1174 to
8159213
Compare
When we encounter an ending EOL conflicts, we currently read from the store to decide whether each side ends with EOL. This approach has 2 issues: * The user can't partially resolve this ending EOL conflict. * When we introduce the `filter` `gitattributes` support and apply the conversion on the different sides of the conflict, on snapshot, we have to use the converted contents to decide the ending EOL of every side. This change makes use the ending EOL of the conflict and extra EOLs on each side to encode the ending EOL conflict in the conflict.
a0fcc67 to
3cf23db
Compare
|
Added the LF/CRLF logic. PTAL. If we find that too complicated and should be in a separate PR, please also let me know. Thanks. Let's also make sure that:
Should I modify the change log? If so, which section should this PR be in, Breaking changes or New features? This change is breaking, but not as breaking from the CLI perspective. To what level of details should I include in the change log? Thanks. |
e8ef469 to
a8f75c1
Compare
lib/src/conflicts.rs
Outdated
| } | ||
|
|
||
| fn detect_eol(single_hunk: &Merge<impl AsRef<[u8]>>) -> &'static str { | ||
| for side in single_hunk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit strange to me that the line ending chosen depends on the order of the terms of the conflict; I would expect that rearranging conflict sides wouldn't change the line endings used for conflict markers.
Maybe we could treat line endings similarly to executable state, and resolve them using Merge::resolve_trivial, defaulting to LF if they can't be resolved?
@yuja, what do you think?
Also, we might want to convert the entire file to use the selected line endings instead of just using the selected line endings for conflict markers, since it's strange if we materialize a file with mixed line endings (e.g. if one side of the conflict uses LF and the other side uses CRLF). But this would require remembering the original line endings for the terms and converting the line endings back to the original line endings when parsing conflicts. It could also be good to include a comment like (uses CRLF line ending) in the conflict label for terms which used CRLF when the conflict is materialized with LF in the file (similar to the current (no terminating newline) comment).
However, if you think about it, those steps are almost identical to how we currently handle missing terminating newlines. This makes me doubt the changes proposed in this PR, since it seems like it could be more consistent to keep the current implementation if we plan to implement CRLF conversion in a similar way. Like maybe "LF" vs "CRLF" and "terminating newline" vs "no terminating newline" are just two aspects of line ending information which can be handled together in the same way as each other.
But I suppose one argument for making these proposed changes anyway would be that LF vs CRLF is about file encoding, whereas missing terminating newline is about file content, so it might make sense that they should be handled differently?
Edit: I think we had discussed this a bit previously after I proposed a similar implementation in a previous issue: #3968 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could treat line endings similarly to executable state, and resolve them using
Merge::resolve_trivial, defaulting toLFif they can't be resolved?
Yes, that sounds nicer. We can of course improve the EOL detection later.
Also, we might want to convert the entire file to use the selected line endings instead of just using the selected line endings for conflict markers, since it's strange if we materialize a file with mixed line endings (e.g. if one side of the conflict uses LF and the other side uses CRLF).
Couldn't that resolve conflicts unexpectedly if there were only LF/CRLF conflicts? Suppose internal merge machinery wouldn't transform line endings, the materialized conflict will have to be distinct about line endings.
FWIW, it's generally preferred to send a new PR if it fixes another problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't that resolve conflicts unexpectedly if there were only LF/CRLF conflicts?
That's true, but we also need to solve that for executable-bit-only conflicts. Maybe we need a more general design like #5496 before we implement full CRLF support then.
In the meantime, it sounds good to me to just use CRLF for the markers themselves when it would be appropriate and not touch the contents of the conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to convert the entire file to use the selected line endings instead of just using the selected line endings for conflict markers
-
The scope of this change is only about EOL of conflict markers, because they are what
jjgenerates, and we have an issue wherejjgenerates LF for conflict markers lines in files with only CRLF EOL. Converting existing contents is out of scope of this PR. -
IMO, the responsibility to convert the EOL of other parts of the file should be handled by the following mechanism:
- The existing
working-copy.eol-conversionsetting, if the user is Ok with always storingLFinStore. - A new builtin
jj fixtool, if the user needs to have a better control over when the conversion should happen. This can be something similar togit add --renormalize. - The future
filtergitattributessupport. Bothjj fixandfiltergitattributessupport provide highly customizability to the user. - A more general merge hook mechanism(e.g., similar to the
mergegitattributesin git, which is a good to have forgit-lfssupport), if we want to introduce a merge algorithm for EOL specifically.
All the mentioned mechanism works at a different stage rather than when the conflict is materialized. This helps conflict materialization to be decoupled from the EOL conversion logic.
- The existing
But I suppose one argument for making these proposed changes anyway would be that LF vs CRLF is about file encoding, whereas missing terminating newline is about file content, so it might make sense that they should be handled differently?
My argument for having this change is that:
- Automatically resolving the file EOL conflicts should not be part of the
conflictsmodule job. As I have mentioned, they should be handled by other mechanisms likeworking-copy.eol-conversionsetting,jj fix, etc. - If the user wants to manually resolve the file EOL conflicts, they should just use an editor that supports EOL conversion.
- To avoid seeing a large conflict hunk only due to EOL changes,
jj fixor an external merge tool is probably a better tool.jjcan still just materialize the conflict with huge hunks. While this can be true for the ending EOL conflict, the current implementation to tell whether every side of the conflict ends with EOL can influence the futuregitattributesfilterfeature. See Gitattributes EOL conversion support #8266 (comment). - The conflict marker EOL change has an actual UX impact for Visual Studio users who always check in CRLF files into the repo. See Conflicts are always added with LF, even in files using CRLF #7376 for details.
Maybe we could treat line endings similarly to executable state, and resolve them using
Merge::resolve_trivial, defaulting to LF if they can't be resolved?
First, I don't think it really improves many situations. The only rare case where the improvement could help is when the file is partially resolved, and all the rest of the file is using the same EOLs, e.g.,
- base:
aa\r\nbb\ncc\n - side1:
aa\r\nbb\ncc\nxx\n - side2:
bb\ncc\nyy\n
In this case, this PR results in conflict markers with CRLF EOL, but those are the only lines that use CRLF in the materialized conflict. Merge::resolve_trivial may be better, since it should result in LF, because it sees side 1 remains mixed EOL, and side 2 changes mixed EOL to LF.
For other cases, the materialized conflict(excluding the conflict markers lines) has the EOL chosen by the conflict markers, so the UX won't be improved. For example,
- base:
aa\nbb\n - side1:
aa\n - side2:
aa\r\nbb\r\n
In this case, the user will see a conflict hunk with both CRLF(the side 2 diff hunk) and LF(the base and side 1 diff hunk). CRLF or LF is used by the conflict marker probably doesn't matter. This PR choses LF. Merge::resolve_trivial chooses CRLF, but I don't think this is a UX improvement.
Furthermore, to actually improve the UX, it's not as simple. See for some examples:
-
If at least one side doesn't have EOL, what EOL should we resolve to?
- base:
aa\nbb\n - side1:
aa - side2:
aa\r\nbb\r\n
Intuitively, the EOL should resolve to CRLF, because that's supposed to be the only EOL change, but
Merge::resolve_trivialsees EOL changed on both sides: side 1 changes to no EOL, side 2 changes to CRLF. The LF default doesn't work here.or
- base:
aa\r\nbb\r\n - side1:
aa - side2:
bb
This case is especially interesting, because
Merge::resolve_trivialwill resolve to "no eol", and default to LF, while the current implementation chooses CRLF, which seems to be more correct. - base:
-
If at least one side has mixed EOLs, what EOL should we resolve to?
- base:
aa\r\nbb\r\ncc\r\n - side1:
aa\r\ncc\r\nxx\r\n - side2:
aa\r\nbb\r\ncc\r\nyy\n
In this particular case,
Merge::resolve_trivialwill probably resolve to mixed EOL, and may then default to LF, but CRLF is probably the better UX: the user is likely accidentally uses LF in side 2. - base:
Unless we think Merge::resolve_trivial is a clear winner for many cases, and we can come up with a concrete test case where this PR is defeated by Merge::resolve_trivial, I think the current PR is good enough until we receive an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this change is only about EOL of conflict markers, because they are what
jjgenerates, and we have an issue wherejjgenerates LF for conflict markers lines in files with only CRLF EOL. Converting existing contents is out of scope of this PR.
Makes sense to me.
Unless we think
Merge::resolve_trivialis a clear winner for many cases, and we can come up with a concrete test case where this PR is defeated byMerge::resolve_trivial, I think the current PR is good enough until we receive an issue.
My main issue with the current implementation is that for this case where CRLF is on side 1, we would pick CRLF:
- base:
base\n - side 1:
left\r\n - side 2:
right\n
But for this case where CRLF is on side 2, we would pick LF:
- base:
base\n - side 1:
left\n - side 2:
right\r\n
This seems inconsistent to me. If we use Merge::resolve_trivial, it would consistently pick CRLF for both of these cases. Alternatively, maybe we could use CRLF only if all terms (or maybe just added terms) use CRLF (excluding terms which don't contain newlines), so we would pick LF for both of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: would it make more sense to have a repo-wide setting, rather than file by file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: would it make more sense to have a repo-wide setting, rather than file by file?
IMO, it may not be worthwhile, because:
- A dedicated setting for the EOL of conflict markers seems to be an over-kill for me.
- It can add to confusion with the existing
working-copy.eol-conversionsetting. - It only improves UX for the rare case where we are materializing a conflict where all sides don't have any EOL.
But I will defer the decision to the maintainers.
EDIT:
The worst scenario I could imagine is that, a user tries to edit a conflict where all sides don't have an EOL, but not resolve it. The user changes the conflict markers to use the CRLF EOL. The user's modification doesn't add any EOLs to any sides. When the user snapshots the change, moves to a different change, and moves back, the conflict markers are now reluctantly using LF EOL again.
But I don't think it's too much of an issue:
- It's rare to see ending EOL conflicts, and it's even rarer to see a conflict where all sides don't have an EOL.
- It's unlikely that users care about what EOL the conflict marker lines use when the file itself doesn't have any EOLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also simplify it a bit using itertools
all_equal_value()maybe?
Changed to use all_equal_value(), and my tests don't catch any weird behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: would it make more sense to have a repo-wide setting, rather than file by file?
IMO, it may not be worthwhile, because:
I think you misunderstood my question. I was thinking in terms of a repo having an EOL choice for the repo, not the conflict markers. Like the repo for Windows software would have one setting and the repo for Linux software would have another setting. But then I guess there are lots of repos that have a mixture for both outputs?
On some other EOL issue, I was thinking your choice of "input" and "output" was problematic, and I think this is why. If the ultimate destination for the repo is for Windows, it shouldn't be changed all the time when touched from Linux, and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking in terms of a repo having an EOL choice for the repo, not the conflict markers.
Then it's out of the scope of this PR and the original issue. The EOL of the conflict markers is what the last commit of the PR and the original issue is all about. While we can continue discussion in this thread, we can use the original issue or open a new issue for this question.
On some other EOL issue, I was thinking your choice of "input" and "output" was problematic, and I think this is why.
The current design of the working-copy.eol-conversion setting exists, only because it's a compatibility feature of the git core.autocrlf config, so the choice was also made by git, not only by me.
If the ultimate destination for the repo is for Windows, it shouldn't be changed all the time when touched from Linux, and vice versa.
The vice versa case should already have been well handled by the existing "working-copy.eol-conversion" setting, or am I missing some cases? If that's the case I am wondering if you could come up with a complete user journey so that we can tell if existing jj can actually cover the use case?
For editting the Windows repos on Linux machines case, we don't support the "modify with any EOLs, convert to CRLF EOL on snapshot, store with the CRLF EOL" option, but there are alternatives:
- Set the
working-copy.eol-conversionsetting toinput-outputon Windows, and toinputon Linux. All files will be stored with the LF EOL. But when checking out on Windows, you have the CRLF EOL files. This is also the suggestion from the Pro Git Book. Whilejjstill works if you store files with CRLF EOL,jjis just more battle tested with the case where LF EOL is used. For example, tests that use LF EOL dominate the repo, and that may simply because Rust multiple line string literal always uses LF regardless of the EOL of the source file. (BTW, Rust is not the only language that does this. Python1 and C++2 also work in a similar way). - Set the
working-copy.eol-conversionsetting tononeon both Windows and Linux. Store all files with the CRLF EOL in the repo. Just to be careful to always use the CRLF EOL on Linux. If the repo is a Windows repo, and people want to edit that on Linux, it won't be an unreasonable ask to ask the Linux person to use a tool that is compatible with Windows EOL. I agree it is annoying, but we should also measure the trade-off between the complexity of thejjproject and the usability. - Use
jj fixorjj runto sanitize the EOL to CRLF and take the path of option 2. It's just one more command to run before upload the change.jjseems to have pre-upload hooks in the future per FR: Generalized hook support #3577. Then the extrajj fixcommand before uploading can also go away. This should close the gap for most of the situations.
If 3. sounds good, we can ask maintainers' opinions on a built in tool to fix EOL. I may have missed important use cases that all the alternatives don't work well. Please let me know. Thanks.
Footnotes
-
From https://docs.python.org/3/reference/lexical_analysis.html#physical-lines: Regardless of platform, each of these sequences is replaced by a single ASCII LF (linefeed) character. (This is done even inside string literals.) Each line can use any of the sequences; they do not need to be consistent within a file. ↩
-
From https://en.cppreference.com/w/cpp/language/translation_phases.html: The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters. ↩
a3f4cbf to
a3cab88
Compare
If the original file uses the CRLF EOL, the materialized conflict uses CRLF as well. This change also adds tests for parsing conflicts with the CRLF EOL. Fix jj-vcs#7376.
a3cab88 to
8fbe9ec
Compare
scott2000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me, thanks. I left a few comments on the documentation since I think it's important to explain this well to avoid confusion.
It also might be nice to split the CRLF changes into another PR as @yuja mentioned.
| generally ignore this comment and resolve the conflict normally. | ||
| a missing terminating newline character in a conflict, it materializes the | ||
| terminating newline character conflict in a special way. The conflict won't end | ||
| with `\n`, and the last `\n` on every side won't be considered as the contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation doesn't seem very clear to me. Maybe we should rewrite this whole section to provide more background? For instance, something like this maybe:
When materializing conflicts, `jj` outputs them in a line-based format, with
each conflict marker line terminated by a newline character (`\n`). This format
works well for text files that consist of a series of lines, with each line
terminated by a newline character as well.
While most text files follow this convention, some do not. Since conflict
markers must appear on their own line, when `jj` encounters a missing
terminating newline character in a conflict, it still needs to ensure that
there is a newline before the next conflict marker. To do this, it adds an
extra newline to each term of the conflict. Then, to compensate for this extra
newline, it also omits the terminating newline from the "end of conflict"
marker (`>>>>>>>`).| with `\n`, and the last `\n` on every side won't be considered as the contents | ||
| of that side. | ||
|
|
||
| You can resolve this conflict by end the file with `\n` or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might suggest to the user that they only need to add the newline to resolve the entire conflict. Maybe we can just remove this line, or maybe it would be more clear to say that they should resolve the conflict as normal, but they should pay extra attention to the terminating newline character.
| character, and one person changed `grape` to `grapefruit`, while another person | ||
| added the missing newline character to make `grape\n`, the resulting conflict | ||
| would look like this: | ||
| would look like this(␊ demonstrates the `\n` character explicitly, note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a space between this and (
|
|
||
| Therefore, a resolution of this conflict could be `grapefruit\n`, with the | ||
| terminating newline character added. | ||
| terminating newline character added; or just `grapefruit` without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for clarity we shouldn't add this. The "correct" resolution of this conflict would normally be to include the newline, since that was the intention of the user who made the change on side 2. If we omit the terminating newline character in the result, that would mean we're completely ignoring the changes from side 2.
lib/src/conflicts.rs
Outdated
| /// * It doesn't have any lines. | ||
| /// * It's the only contents that don't end with `LF` and won't be confused with | ||
| /// the conflict mark. | ||
| fn has_ending_eol(slice: &[u8]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this entire commit isn't needed anymore since the next commit inlines the function anyway.
lib/src/conflicts.rs
Outdated
| } | ||
|
|
||
| /// Write a conflict marker to an output file with an ending `LF`. | ||
| fn writeln_conflict_marker( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since we're going to explicitly write the newlines in the next commit anyway, maybe we can inline this function in this commit as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should add a changelog entry for this (I think you may have mentioned this in another comment, but I couldn't find it now). It might be considered a breaking change since tools could've been relying on conflict markers ending with a newline? But I'm not sure.
When we encounter an ending EOL conflicts, we currently read from the store to decide whether each side ends with EOL. This approach has 2 issues:
filtergitattributessupport and apply the conversion on the different sides of the conflict, on snapshot, we have to use the converted contents to decide the ending EOL of every side.This change makes use the ending EOL of the conflict and extra EOLs on each side to encode the ending EOL conflict in the conflict.
Checklist
If applicable:
CHANGELOG.mdREADME.md,docs/,demos/)I have updated the config schema (cli/src/config-schema.json)This change hasn't polished the following aspect, but I will make the modification if we agree with that this new approach should replace the old approach in #5186 to fix #3968:
CRLFandLF.has_no_eoltoends_with_eol. We have some!has_no_eol(...)which is double negation now.Polish the conflict comment to warn the user that the ending EOL could influence the partial conflict resolution.Add warning message in CLI in case an ending EOL conflict is created.