Skip to content

Commit 9e4f2dc

Browse files
authored
fix(ts/js) use identifier to match potential keywords (#2519)
- (parser) Adds `keywords.$pattern` key to grammar definitions - `lexemes` is now deprecated in favor of `keywords.$pattern` key - enh(typescript) use identifier to match potential keywords, preventing false positives - enh(javascript) use identifier to match potential keywords, preventing false positives
1 parent 33d3afe commit 9e4f2dc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+181
-129
lines changed

CHANGES.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
Parser Engine:
44

5+
- (parser) Adds `keywords.$pattern` key to grammar definitions (#2519) [Josh Goebel][]
56
- (parser) Adds SHEBANG utility mode [Josh Goebel][]
67
- (enh) Added `on:begin` callback for modes (#2261) [Josh Goebel][]
78
- (enh) Added `on:end` callback for modes (#2261) [Josh Goebel][]
@@ -10,10 +11,12 @@ Parser Engine:
1011

1112
Deprecations:
1213

13-
- (deprecation) `endSameAsBegin` is now deprecated. (#2261) [Josh Goebel][]
14+
- `lexemes` is now deprecated in favor of `keywords.$pattern` key (#2519) [Josh Goebel][]
15+
- `endSameAsBegin` is now deprecated. (#2261) [Josh Goebel][]
1416

1517
Language Improvements:
16-
18+
- enh(typescript) use identifier to match potential keywords, preventing false positivites (#2519) [Josh Goebel][]
19+
- enh(javascript) use identifier to match potential keywords, preventing false positivites (#2519) [Josh Goebel][]
1720
- [enh] Add `OPTIMIZE:` and `HACK:` to the labels highlighted inside comments [Josh Goebel][]
1821
- enh(typescript/javascript/coffeescript/livescript) derive ECMAscript keywords from a common foudation (#2518) [Josh Goebel][]
1922
- enh(typescript) add setInterval, setTimeout, clearInterval, clearTimeout (#2514) [Josh Goebel][]
@@ -30,7 +33,7 @@ Language Improvements:
3033
[Vania Kucher]: https://github.com/qWici
3134

3235

33-
## Version 10.0.2 (pending)
36+
## Version 10.0.2
3437

3538
Brower build:
3639

docs/language-guide.rst

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,17 +64,19 @@ and most interesting parsing happens inside tags.
6464
Keywords
6565
--------
6666

67-
In the simple case language keywords are defined in a string, separated by space:
67+
In the simple case language keywords can be defined with a string, separated by space:
6868

6969
::
7070

7171
{
7272
keywords: 'else for if while'
7373
}
7474

75-
Some languages have different kinds of "keywords" that might not be called as such by the language spec
76-
but are very close to them from the point of view of a syntax highlighter. These are all sorts of "literals", "built-ins", "symbols" and such.
77-
To define such keyword groups the attribute ``keywords`` becomes an object each property of which defines its own group of keywords:
75+
Some languages have different kinds of "keywords" that might not be called as
76+
such by the language spec but are very close to them from the point of view of a
77+
syntax highlighter. These are all sorts of "literals", "built-ins", "symbols"
78+
and such. To define such keyword groups the attribute ``keywords`` becomes an
79+
object each property of which defines its own group of keywords:
7880

7981
::
8082

@@ -85,19 +87,25 @@ To define such keyword groups the attribute ``keywords`` becomes an object each
8587
}
8688
}
8789

88-
The group name becomes then a class name in a generated markup enabling different styling for different kinds of keywords.
90+
The group name becomes the class name in the generated markup enabling different
91+
themeing for different kinds of keywords.
8992

90-
To detect keywords highlight.js breaks the processed chunk of code into separate words — a process called lexing.
91-
The "word" here is defined by the regexp ``[a-zA-Z][a-zA-Z0-9_]*`` that works for keywords in most languages.
92-
Different lexing rules can be defined by the ``lexemes`` attribute:
93+
To detect keywords highlight.js breaks the processed chunk of code into separate
94+
words — a process called lexing. By default "words" are matched with the regexp
95+
``\w+``, and that works well for many languages. Different lexing rules can be
96+
defined by the magic ``$pattern`` attribute:
9397

9498
::
9599

96100
{
97-
lexemes: '-[a-z]+',
98-
keywords: '-import -export'
101+
keywords: {
102+
$pattern: /-[a-z]+/, // allow keywords to begin with dash
103+
keyword: '-import -export'
104+
}
99105
}
100106

107+
Note: The older ``lexemes`` setting has been deprecated in favor of using
108+
``keywords.$pattern``. They are functionally identical.
101109

102110
Sub-modes
103111
---------

docs/mode-reference.rst

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -241,14 +241,19 @@ and ``endSameAsBegin: true``.
241241

242242
.. _lexemes:
243243

244-
lexemes
245-
^^^^^^^
244+
lexemes (now keywords.$pattern)
245+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
246246

247247
**type**: regexp
248248

249-
A regular expression that extracts individual lexemes from language text to find :ref:`keywords <keywords>` among them.
250-
Default value is ``hljs.IDENT_RE`` which works for most languages.
249+
A regular expression that extracts individual "words" from the code to compare
250+
against :ref:`keywords <keywords>`. The default value is ``\w+`` which works for
251+
many languages.
251252

253+
Note: It's now recommmended that you use ``keywords.$pattern`` instead of
254+
``lexemes``, as this makes it easier to keep your keyword pattern associated
255+
with your keywords themselves, particularly if your keyword configuration is a
256+
constant that you repeat multiple times within different modes of your grammar.
252257

253258
.. _keywords:
254259

@@ -259,8 +264,8 @@ keywords
259264

260265
Keyword definition comes in two forms:
261266

262-
* ``'for while if else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
263-
* ``{'keyword': ' ... ', 'literal': ' ... '}`` -- an object whose keys are names of different kinds of keywords and values are keyword definition strings in the first form
267+
* ``'for while if|0 else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
268+
* ``{keyword: ' ... ', literal: ' ... ', $pattern: /\w+/ }`` -- an object that describes multiple sets of keywords and the pattern used to find them
264269

265270
For detailed explanation see :doc:`Language definition guide </language-guide>`.
266271

src/highlight.js

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,8 @@ const HLJS = function(hljs) {
131131
}
132132

133133
let last_index = 0;
134-
top.lexemesRe.lastIndex = 0;
135-
let match = top.lexemesRe.exec(mode_buffer);
134+
top.keywordPatternRe.lastIndex = 0;
135+
let match = top.keywordPatternRe.exec(mode_buffer);
136136
let buf = "";
137137

138138
while (match) {
@@ -148,8 +148,8 @@ const HLJS = function(hljs) {
148148
} else {
149149
buf += match[0];
150150
}
151-
last_index = top.lexemesRe.lastIndex;
152-
match = top.lexemesRe.exec(mode_buffer);
151+
last_index = top.keywordPatternRe.lastIndex;
152+
match = top.keywordPatternRe.exec(mode_buffer);
153153
}
154154
buf += mode_buffer.substr(last_index);
155155
emitter.addText(buf);

src/languages/1c.js

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Description: built-in language 1C:Enterprise (v7, v8)
55
Category: enterprise
66
*/
77

8-
export default function(hljs){
8+
export default function(hljs) {
99

1010
// общий паттерн для определения идентификаторов
1111
var UNDERSCORE_IDENT_RE = '[A-Za-zА-Яа-яёЁ_][A-Za-zА-Яа-яёЁ_0-9]+';
@@ -446,9 +446,12 @@ export default function(hljs){
446446
// meta : инструкции препроцессора, директивы компиляции
447447
var META = {
448448
className: 'meta',
449-
lexemes: UNDERSCORE_IDENT_RE,
449+
450450
begin: '#|&', end: '$',
451-
keywords: {'meta-keyword': KEYWORD + METAKEYWORD},
451+
keywords: {
452+
$pattern: UNDERSCORE_IDENT_RE,
453+
'meta-keyword': KEYWORD + METAKEYWORD
454+
},
452455
contains: [
453456
COMMENTS
454457
]
@@ -463,7 +466,6 @@ export default function(hljs){
463466
// function : объявление процедур и функций
464467
var FUNCTION = {
465468
className: 'function',
466-
lexemes: UNDERSCORE_IDENT_RE,
467469
variants: [
468470
{begin: 'процедура|функция', end: '\\)', keywords: 'процедура функция'},
469471
{begin: 'конецпроцедуры|конецфункции', keywords: 'конецпроцедуры конецфункции'}
@@ -474,9 +476,9 @@ export default function(hljs){
474476
contains: [
475477
{
476478
className: 'params',
477-
lexemes: UNDERSCORE_IDENT_RE,
478479
begin: UNDERSCORE_IDENT_RE, end: ',', excludeEnd: true, endsWithParent: true,
479480
keywords: {
481+
$pattern: UNDERSCORE_IDENT_RE,
480482
keyword: 'знач',
481483
literal: LITERAL
482484
},
@@ -496,8 +498,8 @@ export default function(hljs){
496498
return {
497499
name: '1C:Enterprise',
498500
case_insensitive: true,
499-
lexemes: UNDERSCORE_IDENT_RE,
500501
keywords: {
502+
$pattern: UNDERSCORE_IDENT_RE,
501503
keyword: KEYWORD,
502504
built_in: BUILTIN,
503505
class: CLASS,

src/languages/armasm.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ export default function(hljs) {
2121
name: 'ARM Assembly',
2222
case_insensitive: true,
2323
aliases: ['arm'],
24-
lexemes: '\\.?' + hljs.IDENT_RE,
2524
keywords: {
25+
$pattern: '\\.?' + hljs.IDENT_RE,
2626
meta:
2727
//GNU preprocs
2828
'.2byte .4byte .align .ascii .asciz .balign .byte .code .data .else .end .endif .endm .endr .equ .err .exitm .extern .global .hword .if .ifdef .ifndef .include .irp .long .macro .rept .req .section .set .skip .space .text .word .arm .thumb .code16 .code32 .force_thumb .thumb_func .ltorg '+

src/languages/avrasm.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ export default function(hljs) {
99
return {
1010
name: 'AVR Assembly',
1111
case_insensitive: true,
12-
lexemes: '\\.?' + hljs.IDENT_RE,
1312
keywords: {
13+
$pattern: '\\.?' + hljs.IDENT_RE,
1414
keyword:
1515
/* mnemonic */
1616
'adc add adiw and andi asr bclr bld brbc brbs brcc brcs break breq brge brhc brhs ' +

src/languages/bash.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@ export default function(hljs) {
8181
return {
8282
name: 'Bash',
8383
aliases: ['sh', 'zsh'],
84-
lexemes: /\b-?[a-z\._]+\b/,
8584
keywords: {
85+
$pattern: /\b-?[a-z\._]+\b/,
8686
keyword:
8787
'if then else elif fi for while in do done case esac function',
8888
literal:

src/languages/basic.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ export default function(hljs) {
1111
case_insensitive: true,
1212
illegal: '^\.',
1313
// Support explicitly typed variables that end with $%! or #.
14-
lexemes: '[a-zA-Z][a-zA-Z0-9_\$\%\!\#]*',
1514
keywords: {
15+
$pattern: '[a-zA-Z][a-zA-Z0-9_\$\%\!\#]*',
1616
keyword:
1717
'ABS ASC AND ATN AUTO|0 BEEP BLOAD|10 BSAVE|10 CALL CALLS CDBL CHAIN CHDIR CHR$|10 CINT CIRCLE ' +
1818
'CLEAR CLOSE CLS COLOR COM COMMON CONT COS CSNG CSRLIN CVD CVI CVS DATA DATE$ ' +

src/languages/clojure.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,11 @@ Category: lisp
77
*/
88

99
export default function(hljs) {
10+
var SYMBOLSTART = 'a-zA-Z_\\-!.?+*=<>&#\'';
11+
var SYMBOL_RE = '[' + SYMBOLSTART + '][' + SYMBOLSTART + '0-9/;:]*';
1012
var globals = 'def defonce defprotocol defstruct defmulti defmethod defn- defn defmacro deftype defrecord';
1113
var keywords = {
14+
$pattern: SYMBOL_RE,
1215
'builtin-name':
1316
// Clojure keywords
1417
globals + ' ' +
@@ -41,8 +44,6 @@ export default function(hljs) {
4144
'lazy-seq spread list* str find-keyword keyword symbol gensym force rationalize'
4245
};
4346

44-
var SYMBOLSTART = 'a-zA-Z_\\-!.?+*=<>&#\'';
45-
var SYMBOL_RE = '[' + SYMBOLSTART + '][' + SYMBOLSTART + '0-9/;:]*';
4647
var SIMPLE_NUMBER_RE = '[-+]?\\d+(\\.\\d+)?';
4748

4849
var SYMBOL = {
@@ -86,7 +87,6 @@ export default function(hljs) {
8687
};
8788
var NAME = {
8889
keywords: keywords,
89-
lexemes: SYMBOL_RE,
9090
className: 'name', begin: SYMBOL_RE,
9191
starts: BODY
9292
};

0 commit comments

Comments
 (0)