Skip to content

[Markdown] Refactor fenced code blocks#4430

Merged
deathaxe merged 2 commits intosublimehq:masterfrom
deathaxe:pr/markdown/refactor-fenced-codeblocks
Feb 2, 2026
Merged

[Markdown] Refactor fenced code blocks#4430
deathaxe merged 2 commits intosublimehq:masterfrom
deathaxe:pr/markdown/refactor-fenced-codeblocks

Conversation

@deathaxe
Copy link
Collaborator

@deathaxe deathaxe commented Feb 1, 2026

This PR...

  1. uses two layers of embed to a) consume fenced code block punctuation in dedicated patterns of a single embed/escape statement. b) lazy load embedded syntax definitions on demand to avoid context sanity limit to be exceeded.

    This results in a single Oniguruma fallback context to be injected only by the top-level embed/escape, which should prevent trouble with stack overflows etc. in some circumstances.

    It significantly reduces pattern redundancy when adding new syntax support, as new blocks don't need to deal with punctuation related details.

    Patterns for syntax highlighted code blocks immediately start matching language names and don't need to provide individual escape patterns.

    This works around opening an .md file having a line with 106 tildes or more causes Sublime Text 4 (build 4200) to quit sublime_text#6823

  2. merges all included fenced-... contexts into fenced-code-block-body to reduce syntax cache size and create a new set of contexts strictly separated from previous structure. So if a syntax extends on this Markdown to inject fenced code blocks, exactly only this one will fail, without breaking whole syntax definition.

    Contexts are merged as language name patterns are not expected to require overrides. Content can be replaced by overriding new fenced-code-block-...-content contexts.

  3. removes unnecessary capture groups (leading whitespace) from fenced_code_block_start pattern.

  4. fixes patterns not allowing backticks in info strings of syntax highlighted fenced code blocks with tildes.

  5. enables arbitrary syntax highlighting in info strings, which is used for pandoc style attributes at the moment.

Notes:

  1. This is a breaking change with regards to 3rd-party syntax definitions, which extend from core Markdown to add more fenced code block syntaxes.

    Known packages are:

  2. Overall syntax cache is reduced by about 60kB.

  3. Parsing performance bench-marked against syntax test file is unchanged.

This commit...

1. uses two layers of `embed` to
   a) consume fenced code block punctuation in dedicated patterns of a single
      embed/escape statement.
   b) lazy load embedded syntax definitions on demand to avoid context sanity
      limit to be exceeded.

   This results in a single Oniguruma fallback context to be injected only
   by the top-level embed/escape, which should prevent trouble with stack
   overflows etc. in some circumstances.

   It significantly reduces pattern redundancy when adding new syntax support,
   as new blocks don't need to deal with punctuation related details.

   Patterns for syntax highlighted code blocks immediately start matching
   language names and don't need to provide individual escape patterns.

2. merges all included `fenced-...` contexts into `fenced-code-block-body`
   to reduce syntax cache size and create a new set of contexts strictly
   separated from previous structure. So if a syntax extends on this Markdown
   to inject fenced code blocks, exactly only this one will fail, without
   breaking whole syntax definition.

   Contexts are merged as language name patterns are not expected to require
   overrides. Content can be replaced by overriding new
   `fenced-code-block-...-content` contexts.

3. removes unnecessary capture groups (leading whitespace) from
   `fenced_code_block_start` pattern.

4. fixes patterns not allowing backticks in info strings of syntax highlighted
   fenced code blocks with tildes.

5. enables arbitrary syntax highlighting in info strings, which is used for
   pandoc style attributes at the moment.

Note: This is a breaking change with regards to 3rd-party syntax definitions,
      which extend from core Markdown to add more fenced code block syntaxes.

      Overall syntax cache is reduced by about 60kB.

      Parsing performance bench-marked against syntax test file is unchanged.
keith-hall
keith-hall previously approved these changes Feb 1, 2026
deathaxe added a commit to SublimeText/CoffeeScript that referenced this pull request Feb 1, 2026
deathaxe added a commit to SublimeText/Astro that referenced this pull request Feb 1, 2026
deathaxe added a commit to SublimeText/CoffeeScript that referenced this pull request Feb 1, 2026
... for template languages
@deathaxe deathaxe merged commit 3c2ed52 into sublimehq:master Feb 2, 2026
2 checks passed
@deathaxe deathaxe deleted the pr/markdown/refactor-fenced-codeblocks branch February 2, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants