Skip to content

Commit da071fe

Browse files
Fix diffWords crashing when used with an Intl.Segmenter on a text with consecutive newlines (#631)
* Add test for broken case reported in #630 * Fix the bug * Add release notes
1 parent 8cd6e0c commit da071fe

File tree

3 files changed

+34
-1
lines changed

3 files changed

+34
-1
lines changed

release-notes.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Release Notes
22

3+
## Future 8.0.3 release
4+
5+
- [#631](https://github.com/kpdecker/jsdiff/pull/631) - **fix support for using an `Intl.Segmenter` with `diffWords`**. This has been almost completely broken since the feature was added in v6.0.0, since it would outright crash on any text that featured two consecutive newlines between a pair of words (a very common case).
6+
37
## 8.0.2
48

59
- [#616](https://github.com/kpdecker/jsdiff/pull/616) **Restored compatibility of `diffSentences` with old Safari versions.** This was broken in 8.0.0 by the introduction of a regex with a [lookbehind assertion](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookbehind_assertion); these weren't supported in Safari prior to version 16.4.

src/diff/word.ts

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,24 @@ class WordDiff extends Diff<string, string> {
6767
if (segmenter.resolvedOptions().granularity != 'word') {
6868
throw new Error('The segmenter passed must have a granularity of "word"');
6969
}
70-
parts = Array.from(segmenter.segment(value), segment => segment.segment);
70+
// We want `parts` to be an array whose elements alternate between being
71+
// pure whitespace and being pure non-whitespace. This is ALMOST what the
72+
// segments returned by a word-based Intl.Segmenter already look like,
73+
// and therefore we can ALMOST get what we want by simply doing...
74+
// parts = Array.from(segmenter.segment(value), segment => segment.segment);
75+
// ... but not QUITE, because there's of one annoying special case: every
76+
// newline character gets its own segment, instead of sharing a segment
77+
// with other surrounding whitespace. We therefore need to manually merge
78+
// consecutive segments of whitespace into a single part:
79+
parts = [];
80+
for (const segmentObj of Array.from(segmenter.segment(value))) {
81+
const segment = segmentObj.segment;
82+
if (parts.length && (/\s/).test(parts[parts.length - 1]) && (/\s/).test(segment)) {
83+
parts[parts.length - 1] += segment;
84+
} else {
85+
parts.push(segment);
86+
}
87+
}
7188
} else {
7289
parts = value.match(tokenizeIncludingWhitespace) || [];
7390
}

test/diff/word.js

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,18 @@ describe('WordDiff', function() {
284284
diffWords('foo', 'bar', {intlSegmenter: segmenter});
285285
}).to['throw']('The segmenter passed must have a granularity of "word"');
286286
});
287+
288+
it("doesn't blow up when using an Intl.Segmenter on a text with a double newline", () => {
289+
// Regression test for https://github.com/kpdecker/jsdiff/issues/630
290+
const englishSegmenter = new Intl.Segmenter('en', {granularity: 'word'});
291+
expect(convertChangesToXML(diffWords(
292+
'A\n\nX',
293+
'B\n\nX',
294+
{intlSegmenter: englishSegmenter}
295+
))).to.equal(
296+
'<del>A</del><ins>B</ins>\n\nX'
297+
);
298+
});
287299
});
288300

289301
describe('#diffWordsWithSpace', function() {

0 commit comments

Comments
 (0)