Skip to content

dumper.js, why my emojis are escaped ? #587

Closed
@ysavary

Description

@ysavary

Hi all,

The yaml specification says (https://yaml.org/spec/1.2/spec.html#id2770814):

On output, a YAML processor must only produce acceptable characters. Any excluded characters must be presented using escape sequences. In addition, any allowed characters known to be non-printable should also be escaped. This isn’t mandatory since a full implementation would require extensive character property tables.

c-printable ::= #x9 | #xA | #xD | [#x20-#x7E]          /* 8 bit */
              | #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */
              | [#x10000-#x10FFFF]                     /* 32 bit */

dumper.js follow the specification:

function isPrintable(c) {
return (0x00020 <= c && c <= 0x00007E)
|| ((0x000A1 <= c && c <= 0x00D7FF) && c !== 0x2028 && c !== 0x2029)
|| ((0x0E000 <= c && c <= 0x00FFFD) && c !== 0xFEFF /* BOM */)
|| (0x10000 <= c && c <= 0x10FFFF);
}

But the code that is calling isPrintable() use charCodeAt. This function will never return char > 0xFFFF so the supplementary planes range (0x010000 to 0x10FFFF) which include emojis will never be detected as printable.

for (i = 0; i < string.length; i++) {
char = string.charCodeAt(i);
if (!isPrintable(char)) {
return STYLE_DOUBLE;
}

I tested with codePointAt and Array.from(string) to iterate the chars while managing surrogate pairs: emojis are not escaped as desired.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions