Skip to content

(XML) Unicode letters should be recognized in element and attribute names #3256

Closed
@martin-honnen

Description

@martin-honnen

Describe the issue
XML allows Unicode letters as element and attribute names while your xml.js language mode uses a regular expression just checking for ASCII letters A-Z.
That way anyone trying to highlight XML with non-ASCII letters in element or attribute names doesn't get highlighting e.g. in <categoría>producto</categoría> the Spanish word categoría which is a well-formed XML element name is not recognized as such by the regular expression const TAG_NAME_RE = regex.concat(/[A-Z_]/, regex.optional(/[A-Z0-9_.-]*:/), /[A-Z0-9_.-]*/); in https://github.com/highlightjs/highlight.js/blob/main/src/languages/xml.js#L12

Which language seems to have the issue?
XML from https://github.com/highlightjs/highlight.js/blob/main/src/languages/xml.js

Are you using highlight or highlightAuto?

highlight

Sample Code to Reproduce

console.log(hljs.highlight(`
<root>
  <categoría>test</categoría>
  <category>test</category>
</root>`, {language: 'xml'}).value)

Expected behavior

The output for the XML markup <categoría>test</categoría> currently is &lt;categoría&gt;test&lt;/categoría&gt; while it should be <span class="hljs-tag">&lt;<span class="hljs-name">categoría</span>&gt;</span>test<span class="hljs-tag">&lt;/<span class="hljs-name">categoría</span>&gt;</span>.

Additional context

https://www.w3.org/TR/xml/#NT-NameStartChar and https://www.w3.org/TR/xml/#NT-NameChar definitions from XML spec. I think it should be possible to fix the regular expressions used in xml.js, either by using ranges of the characters given in the XML spec or, if the Unicode support in JavaScript regular expressions is used, by using e.g. \p{Letter} instead of A-Z.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions