Skip to content

keycap emoji is treated as formatting #646

@mikesamuel

Description

@mikesamuel

The keycap emoji, *️⃣, used for the '*' telephone button is encoded via a sequence of 3 codepoints:

  • U+2A (Asterisk)
  • U+FE0F
  • U+20E3

Sometimes CommonMark treats the leading asterisk as a formatting character as in **️⃣abc** (
\x{2A 2A FE0F 20E3 61 62 63 2A 2A} )

To reproduce

permalink to REPL

Screenshot

Observe that there is a placeholder glyph followed by bold "abc".
Note that the HTML tab shows <p><strong>️⃣abc</strong></p>.

I expect that instead, the output should contain all three UTF-16 code units for the *️⃣ emoji.

Relevant specifications

Unicode TR#51 explains

ED-14c. emoji keycap sequence — A sequence of the following form:

emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}

Possibly out of scope, but to get the keycap on the first line of this issue to show up properly in Github flavoured markdown, I needed to precede it with a backslash (\).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions