Consider refactoring one use of recursion to allow support in Shiki's JS engine

Context: [Shiki](https://github.com/shikijs/shiki) offers a JavaScript regex engine that includes highly accurate Oniguruma emulation, and [supports the vast majority of TextMate grammars](https://shiki.style/references/engine-js-compat). The JS engine allows lightweight use of TM grammars without requiring a large Oniguruma WASM binary. This is especially beneficial in browsers, for better load/start time. The JS engine is also faster to execute some grammars since the regexes run with native JS performance.

In the rare cases where a grammar (like C#) isn't supported, it's due to one of the following reasons:

1. Bugs in the TM grammar (invalid Oniguruma regexes that fail silently in `vscode-textmate`, but the error is not ignored by the JS engine).
2. Bugs in Oniguruma that the JS engine doesn't reproduce (leading to the JS engine rather than Oniguruma providing the intended highlighting).
3. Bugs in [`oniguruma-to-es`](https://github.com/slevithan/oniguruma-to-es) which I maintain and is used under the hood by the JS engine. A lot of work has gone into making the emulation extremely accurate, so this is rare but possible.
4. Use of a small number of rare regex features that are not yet emulated by `oniguruma-to-es`, or are not possible/feasible to emulate.

The C# grammar falls under number 4, due to one regex using multiple *overlapping* recursions. If this regex in the C# grammar was refactored, the grammar would be supported (as would the Razor grammar, since Razor's one erroring regex comes from embedding the C# grammar).

## The problematic regex

Here's the [regex in question](https://github.com/dotnet/csharp-tmLanguage/blob/9d446ab1fd31271c7c935069669e8504ddddb8ad/src/csharp.tmLanguage.yml#L2586-L2602).

And here's the problematic portion:

```
<
(?<type_args>
  [^<>()]++|
  <\g<type_args>*+>|
  \(\g<type_args>*+\)
)*+
>
```

## Potential solution

Perhaps this can be refactored to simply establish the boundaries of the type args, balancing the angled brackets but not the parentheses, like so:

```
<
(?<type_args>
  [^<>]++|
  <\g<type_args>*+>
)*+
>
```

Then, optionally, a subsequent regex could be used to check for and highlight the balanced parentheses inside.

Alternatively, a depth limit could be imposed for either parens or brackets. For example:

```
<
(?<type_args>
  [^<>()] |
  \( (?: [^()<>] | < [^()<>]* > | \( [^()<>]* \) )* \) |
  < \g<type_args>* >
)*
>
```

The above is very similar to the original in what it matches. It doesn't allow anything unbalanced, and it allows any depth of nested brackets (well, actually a depth limit of 20 since that's Oniguruma's recursion limit). But whenever it encounters a new set of parens, it only allows one level of nested parens or brackets inside it. So, for example, it continues to accurately match e.g. `<>`, `<a>`, `<a<>>`, `<<a>>`, `<()>`, `<(())>`, `<a(a)>`, `<(<>)>`, `<<>()>`, and `<(())<<<<(<>())>>>>>`, and it continues to accurately not match e.g. `<a(>`, `<)>`, `<a(><)>`, `<a<)>>`, `<<(<<<)>>`, `<<(>>`, `<<)>>`, and `<<(>)>`. The difference is that it doesn't match `<((()))>`, though the current regex does. However, it could be edited to allow more levels of nesting within parens (I'd be happy to do so if that helps!) -- it would just be a bit verbose.

## Overlapping recursions

The reason `oniguruma-to-es` doesn't allow emulating *overlapping* recursions is because it can get unmanageable very fast. Oniguruma uses a depth limit of 20, and `oniguruma-to-es` uses 20 as the default/max limit. Expanding a regex pattern in JS `RegExp` source with up to 20×20 expansions (for a basic case of overlapping recursion) is *maybe* on the edge of manageable, but as soon as you do something like (20×20)×(20×20) it seems like a very bad idea. In this case, the C# grammar is doing something that *looks* so simple 😊 but I believe would be 20 factorial times 2 (20!×2) expansions, which of course is unworkable.

What do you think? Are you open to refactoring/removing the overlapping recursion in the grammar?

Although this would currently only add support for your grammar to Shiki's JS engine, the JS engine (which is new) has seen significant adoption as it has continued to improve to its current robust state. I also expect more libraries that work with TM grammars to adopt `oniguruma-to-es` in the future. Removing this one case of overlapping recursion would mean that the C# and Razor grammars are supported by all of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider refactoring one use of recursion to allow support in Shiki's JS engine #327

The problematic regex

Potential solution

Overlapping recursions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider refactoring one use of recursion to allow support in Shiki's JS engine #327

Description

The problematic regex

Potential solution

Overlapping recursions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions