Skip to content

Commit 41bde38

Browse files
committed
cpp: Fix highlighting of unterminated raw strings
PR highlightjs#1897 switched C++ raw strings to use backreferences, however this breaks souce files where raw strings are truncated. Like comments, it would be preferable to highlight them. Instead, go back to using separate begin and end regexps, but introduce an endFilter feature to filter out false positive matches. This internally works similarly to endSameAsBegin. See also issue highlightjs#2259.
1 parent d95ace7 commit 41bde38

File tree

7 files changed

+55
-8
lines changed

7 files changed

+55
-8
lines changed

docs/reference.rst

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ endSameAsBegin
186186
Acts as ``end`` matching exactly the same string that was found by the
187187
corresponding ``begin`` regexp.
188188

189-
For example, in PostgreSQL string constants can uee "dollar quotes",
189+
For example, in PostgreSQL string constants can use "dollar quotes",
190190
consisting of a dollar sign, an optional tag of zero or more characters,
191191
and another dollar sign. String constant must be ended with the same
192192
construct using the same tag. It is possible to nest dollar-quoted string
@@ -204,6 +204,26 @@ In this case you can't simply specify the same regexp for ``begin`` and
204204
``end`` (say, ``"\\$[a-z]\\$"``), but you can use ``begin: "\\$[a-z]\\$"``
205205
and ``endSameAsBegin: true``.
206206

207+
.. _endFilter:
208+
209+
endFilter
210+
^^^^^^^^^
211+
212+
**type**: function
213+
214+
Filters ``end`` matches to implement end rules that cannot be expressed as a
215+
standalone regular expression.
216+
217+
This should be a function which takes two string parameters, the string that
218+
matched the ``begin`` regexp and the string that matched the ``end`` regexp. It
219+
should return true to end the mode and false otherwise.
220+
221+
For example, C++11 raw string constants use syntax like ``R"tag(.....)tag"``,
222+
where ``tag`` is any zero to sixteen character string that must be repeated at
223+
the end. This could be matched with a single regexp containing backreferences,
224+
but truncated raw strings would not highlight. Instead, ``endFilter`` can be
225+
used to reject ``)tag"`` delimiters which do not match the starting value.
226+
207227
.. _lexemes:
208228

209229
lexemes

src/highlight.js

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -501,15 +501,19 @@ https://highlightjs.org/
501501
return new RegExp(value.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 'm');
502502
}
503503

504-
function endOfMode(mode, lexeme) {
505-
if (testRe(mode.endRe, lexeme)) {
504+
function endOfMode(mode, matchPlusRemainder, lexeme) {
505+
var modeEnded = testRe(mode.endRe, matchPlusRemainder);
506+
if (modeEnded && mode.endFilter) {
507+
modeEnded = mode.endFilter(mode.beginValue, lexeme);
508+
}
509+
if (modeEnded) {
506510
while (mode.endsParent && mode.parent) {
507511
mode = mode.parent;
508512
}
509513
return mode;
510514
}
511515
if (mode.endsWithParent) {
512-
return endOfMode(mode.parent, lexeme);
516+
return endOfMode(mode.parent, matchPlusRemainder, lexeme);
513517
}
514518
}
515519

@@ -585,9 +589,9 @@ https://highlightjs.org/
585589
mode_buffer = '';
586590
}
587591

588-
function startNewMode(mode) {
592+
function startNewMode(mode, lexeme) {
589593
result += mode.className? buildSpan(mode.className, '', true): '';
590-
top = Object.create(mode, {parent: {value: top}});
594+
top = Object.create(mode, {parent: {value: top}, beginValue: {value: lexeme}});
591595
}
592596

593597

@@ -617,7 +621,7 @@ https://highlightjs.org/
617621
function doEndMatch(match) {
618622
var lexeme = match[0];
619623
var matchPlusRemainder = value.substr(match.index);
620-
var end_mode = endOfMode(top, matchPlusRemainder);
624+
var end_mode = endOfMode(top, matchPlusRemainder, lexeme);
621625
if (!end_mode) { return; }
622626

623627
var origin = top;

src/languages/cpp.js

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,16 @@ function(hljs) {
2727
begin: '(u8?|U|L)?\'(' + CHARACTER_ESCAPES + "|.)", end: '\'',
2828
illegal: '.'
2929
},
30-
{ begin: /(?:u8?|U|L)?R"([^()\\ ]{0,16})\((?:.|\n)*?\)\1"/ }
30+
{
31+
begin: /(?:u8?|U|L)?R"[^()\\ ]{0,16}\(/,
32+
end: /\)[^()\\ ]{0,16}"/,
33+
endFilter: function(begin, end) {
34+
var quote = begin.indexOf('"');
35+
var beginDelimiter = begin.substring(quote + 1, begin.length - 1);
36+
var endDelimiter = end.substring(1, end.length - 1);
37+
return beginDelimiter == endDelimiter;
38+
},
39+
}
3140
]
3241
};
3342

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
<span class="hljs-comment">/*
2+
Truncated block comment
3+
</span>
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/*
2+
Truncated block comment
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<span class="hljs-string">R"foo(
2+
Truncated raw string
3+
)nope"
4+
Still not completed.
5+
</span>
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
R"foo(
2+
Truncated raw string
3+
)nope"
4+
Still not completed.

0 commit comments

Comments
 (0)