Skip to content

Commit

Permalink
feat(engine-js): improve js engine by replacing hard-coded recursive …
Browse files Browse the repository at this point in the history
…reference
  • Loading branch information
antfu committed Sep 13, 2024
1 parent 4f7e5d1 commit b3d493b
Show file tree
Hide file tree
Showing 8 changed files with 131 additions and 24 deletions.
32 changes: 16 additions & 16 deletions docs/references/engine-js-compat.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
| | Count |
| :-------------- | --------------------------------: |
| Total Languages | 213 |
| Fully Supported | [164](#fully-supported-languages) |
| Mismatched | [20](#mismatched-languages) |
| Unsupported | [29](#unsupported-languages) |
| Fully Supported | [171](#fully-supported-languages) |
| Mismatched | [24](#mismatched-languages) |
| Unsupported | [18](#unsupported-languages) |

## Fully Supported Languages

Expand All @@ -29,6 +29,7 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| applescript | ✅ OK | 152 | - | |
| ara | ✅ OK | 54 | - | |
| asm | ✅ OK | 297 | - | |
| astro | ✅ OK | 1090 | - | |
| awk | ✅ OK | 36 | - | |
| ballerina | ✅ OK | 230 | - | |
| bat | ✅ OK | 58 | - | |
Expand Down Expand Up @@ -67,6 +68,7 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| fluent | ✅ OK | 23 | - | |
| fortran-fixed-form | ✅ OK | 332 | - | |
| fortran-free-form | ✅ OK | 328 | - | |
| fsharp | ✅ OK | 239 | - | |
| fsl | ✅ OK | 30 | - | |
| gdresource | ✅ OK | 157 | - | |
| gdscript | ✅ OK | 93 | - | |
Expand Down Expand Up @@ -117,6 +119,7 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| move | ✅ OK | 120 | - | |
| narrat | ✅ OK | 34 | - | |
| nextflow | ✅ OK | 17 | - | |
| nim | ✅ OK | 1126 | - | |
| nix | ✅ OK | 80 | - | |
| nushell | ✅ OK | 81 | - | |
| objective-c | ✅ OK | 223 | - | |
Expand All @@ -143,6 +146,7 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| riscv | ✅ OK | 36 | - | |
| rust | ✅ OK | 89 | - | |
| sas | ✅ OK | 101 | - | |
| sass | ✅ OK | 69 | - | |
| scala | ✅ OK | 112 | - | |
| scheme | ✅ OK | 34 | - | |
| scss | ✅ OK | 234 | - | |
Expand All @@ -154,6 +158,7 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| sql | ✅ OK | 67 | - | |
| ssh-config | ✅ OK | 12 | - | |
| stylus | ✅ OK | 107 | - | |
| svelte | ✅ OK | 1491 | - | |
| system-verilog | ✅ OK | 102 | - | |
| systemd | ✅ OK | 32 | - | |
| tasl | ✅ OK | 23 | - | |
Expand All @@ -176,6 +181,8 @@ Languages that works with the JavaScript RegExp engine, and will produce the sam
| verilog | ✅ OK | 33 | - | |
| vhdl | ✅ OK | 82 | - | |
| viml | ✅ OK | 72 | - | |
| vue | ✅ OK | 1597 | - | |
| vue-html | ✅ OK | 1620 | - | |
| vyper | ✅ OK | 238 | - | |
| wasm | ✅ OK | 78 | - | |
| wenyan | ✅ OK | 18 | - | |
Expand All @@ -200,12 +207,16 @@ Languages that does not throw with the JavaScript RegExp engine, but will produc
| elixir | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=elixir) | 708 | - | 179 |
| erlang | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=erlang) | 147 | - | 470 |
| glsl | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=glsl) | 186 | - | 306 |
| haml | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=haml) | 1612 | - | 48 |
| kusto | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=kusto) | 60 | - | 40 |
| markdown | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=markdown) | 118 | - | 648 |
| mdc | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=mdc) | 784 | - | 407 |
| mermaid | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=mermaid) | 129 | - | 38 |
| nginx | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=nginx) | 378 | - | 4 |
| objective-cpp | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=objective-cpp) | 309 | - | 172 |
| php | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=php) | 1131 | - | 605 |
| po | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=po) | 23 | - | 336 |
| pug | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=pug) | 1013 | - | 164 |
| ruby | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=ruby) | 1307 | - | 1 |
| shellscript | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=shellscript) | 148 | - | 56 |
| smalltalk | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=smalltalk) | 35 | - | 40 |
Expand All @@ -220,18 +231,9 @@ Languages that throws with the JavaScript RegExp engine (contains syntaxes that
| Language | Highlight Match | Patterns Parsable | Patterns Failed | Diff |
| ---------- | :------------------------------------------------------------------------- | ----------------: | --------------: | ---: |
| ada | ✅ OK | 201 | 1 | |
| astro | ✅ OK | 1088 | 2 | |
| sass | ✅ OK | 67 | 2 | |
| fsharp | ✅ OK | 232 | 7 | |
| nim | ✅ OK | 1119 | 7 | |
| svelte | ✅ OK | 1482 | 9 | |
| vue | ✅ OK | 1588 | 9 | |
| vue-html | ✅ OK | 1611 | 9 | |
| asciidoc | ✅ OK | 4388 | 93 | |
| wikitext | ✅ OK | 5208 | 95 | |
| wikitext | ✅ OK | 5217 | 86 | |
| asciidoc | ✅ OK | 4390 | 91 | |
| blade | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=blade) | 1124 | 2 | |
| pug | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=pug) | 1011 | 2 | 164 |
| haml | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=haml) | 1603 | 9 | 48 |
| rst | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=rst) | 1835 | 22 | 62 |
| latex | [🚧 Mismatch](https://textmate-grammars-themes.netlify.app/?grammar=latex) | 2451 | 48 | 25 |
| powershell | ❌ Error | 87 | 1 | |
Expand All @@ -240,8 +242,6 @@ Languages that throws with the JavaScript RegExp engine (contains syntaxes that
| swift | ❌ Error | 325 | 4 | 18 |
| kotlin | ❌ Error | 52 | 6 | 2986 |
| purescript | ❌ Error | 67 | 6 | 1488 |
| markdown | ❌ Error | 111 | 7 | 584 |
| mdc | ❌ Error | 777 | 7 | 377 |
| apex | ❌ Error | 173 | 14 | 242 |
| haskell | ❌ Error | 136 | 21 | 12 |
| cpp | ❌ Error | 490 | 22 | 25 |
Expand Down
48 changes: 48 additions & 0 deletions packages/engine-javascript/scripts/generate.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import fs from 'node:fs/promises'
import { expandRecursiveBackReference } from './utils'

interface ReplacementRecursiveBackReference {
type: 'recursive-back-reference'
regex: string
groupName: string
fallback: string
recursive?: number
}

interface ReplacementStatic {
type: 'static'
regex: string
replacement: string
}

type Replacement = ReplacementRecursiveBackReference | ReplacementStatic

const replacements: Replacement[] = [
{
// Subroutine recursive reference are not supported in JavaScript regex engine.
// We expand a few levels of recursion to literals to simulate the behavior (incomplete)
type: 'recursive-back-reference',
regex: '(?<square>[^\\[\\]\\\\]|\\\\.|\\[\\g<square>*+\\])',
groupName: 'square',
fallback: '(?:[^\\[\\]\\\\])',
},
{
type: 'recursive-back-reference',
regex: '(?<url>(?>[^\\s()]+)|\\(\\g<url>*\\))',
groupName: 'url',
fallback: '[^\\s\\(\\)]',
},
]

const result = replacements.map((r) => {
switch (r.type) {
case 'recursive-back-reference':
return [r.regex, expandRecursiveBackReference(r.regex, r.groupName, r.fallback, r.recursive ?? 2)]
case 'static':
return [r.regex, r.replacement]
default:
throw new Error(`Unknown replacement type: ${(r as any).type}`)
}
})

fs.writeFile(new URL('../src/replacements.ts', import.meta.url), `// Generated by script\n\nexport const replacements = ${JSON.stringify(result, null, 2)} as [string, string][]\n`, 'utf-8')
21 changes: 21 additions & 0 deletions packages/engine-javascript/scripts/utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
export function expandRecursiveBackReference(
regex: string,
name: string,
fallback: string,
recursive = 2,
) {
const refMarker = new RegExp(`\\\\g<${name}>`, 'g')
const groupMaker = new RegExp(`\\(\\?<${name}>`, 'g')
const normalized = regex.replace(groupMaker, '(?:')

let out = regex
for (let i = 0; i < recursive; i++) {
out = out.replace(refMarker, normalized)
}

out = out
.replace(refMarker, fallback)
.replace(groupMaker, '(?:')

return out
}
9 changes: 8 additions & 1 deletion packages/engine-javascript/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import type {
RegexEngineString,
} from '@shikijs/types'
import { onigurumaToRegexp } from 'oniguruma-to-js'
import { replacements } from './replacements'

export interface JavaScriptRegexEngineOptions {
/**
Expand Down Expand Up @@ -77,7 +78,13 @@ export class JavaScriptScanner implements PatternScanner {
throw cached
}
try {
const regex = regexConstructor(p)
let pattern = p
if (simulation) {
for (const [from, to] of replacements) {
pattern = pattern.replaceAll(from, to)
}
}
const regex = regexConstructor(pattern)
cache?.set(p, regex)
return regex
}
Expand Down
12 changes: 12 additions & 0 deletions packages/engine-javascript/src/replacements.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Generated by script

export const replacements = [
[
'(?<square>[^\\[\\]\\\\]|\\\\.|\\[\\g<square>*+\\])',
'(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\])*+\\])*+\\])*+\\])',
],
[
'(?<url>(?>[^\\s()]+)|\\(\\g<url>*\\))',
'(?:(?>[^\\s()]+)|\\((?:(?>[^\\s()]+)|\\((?:(?>[^\\s()]+)|\\([^\\s\\(\\)]*\\))*\\))*\\))',
],
] as [string, string][]
19 changes: 19 additions & 0 deletions packages/engine-javascript/test/scripts.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { describe, expect, it } from 'vitest'
import { expandRecursiveBackReference } from '../scripts/utils'

describe('expandRecursiveBackReference', () => {
it('case 1', () => {
const name = 'square'
const regex = '(?<square>[^\\[\\]\\\\]|\\\\.|\\[\\g<square>*\\])'
const fallback = '(?:[^\\[\\]\\\\])'

expect(expandRecursiveBackReference(regex, name, fallback, 0))
.toMatchInlineSnapshot(`"(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\])*\\])"`)

expect(expandRecursiveBackReference(regex, name, fallback, 1))
.toMatchInlineSnapshot(`"(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\])*\\])*\\])"`)

expect(expandRecursiveBackReference(regex, name, fallback, 2))
.toMatchInlineSnapshot(`"(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\]|\\\\.|\\[(?:[^\\[\\]\\\\])*\\])*\\])*\\])"`)
})
})
12 changes: 6 additions & 6 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pnpm-workspace.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ catalog:
minimist: ^1.2.8
monaco-editor-core: ^0.51.0
ofetch: ^1.3.4
oniguruma-to-js: 0.4.0
oniguruma-to-js: 0.4.3
picocolors: ^1.1.0
pinia: ^2.2.2
pnpm: ^9.10.0
Expand Down

0 comments on commit b3d493b

Please sign in to comment.