Description
The following three examples of fenced code blocks are valid in MkDocs, acording to the docs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md
However currently only the first one is highlighted in VS Code, as python code:
If the language is the only attribute, then the dot prefixing and curly braces may
be omitted.
``` python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```
The rest of them are not highlighted currently:
Technically the key/value pairs should not be allowed outside of the curly
braces, as I read the docs, but its not really explicit on this. MkDocs
produces valid output for this example both without and with curly braces.
``` .python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```
When embraced in curly braces then MkDocs dictates that the language must be
prefixed with a dot, but then an HTML `id` can be added and multiple `class`
attributes including key/value pairs.
``` { .python #id .class hl_lines="1-2 4" title="My title" }
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```
If the space is removed between the start curly brace and the dot prefixed language attribute in the last example, then it is matched, due to #57 which added support for Codebraid style Pandoc attributes.
I have been playing around with an updated RegEx that will properly match the above by 1) allowing languages to be dot prefixed, and 2) generalising the Codebraid contribution by removing it as an identifier of the few supported languages and including it in the RegEx so all languages can be surrounded by curly braces:
(^|\\G)(\\s*)([\`~]{3,})\\s*(?i:(?:\\{\\s*\\.?(?<LANG>${identifiers.join('|')})(?<ATTR>(?:\\s+|:|,|\\{|\\?)[^\`\\r\\n]*?)?\\s*\\})|(?:\\.?(\\g<LANG>)(\\g<ATTR>)?))$
I decided to use named scopes in the regex such that I could back reference them in the second scenarios. I don't know if this makes the RegEx slower, compared to explicitly inserting the language and attribute specification twice.
Currently this updated RegEx only changes the test/colorize-results/pr-57_md.json
, as it no longer assigns the language scope to the entire sting: "{ .python .cb.nb jupyter_kernel=python3 }"
, but now it discards the braces and the dot, assigns the language scope to the string "python"
(as one would expect), and assigns the attribute scope to the rest: " .cb.nb jupyter_kernel=python3"
.
The downside currently seems to be that it includes the space in the beginning of the attribute part. However I'm not sure if this is worth using more energy on, as the attributes is not really used for anything as far as I can see, at least now it actually assigns the attribute scope to that example.