Skip to content

Utf8Codec doesn't throw FormatException for single and paired UTF-16 surrogates  #28832

Closed
@sgrekhov

Description

@sgrekhov

Please see https://api.dartlang.org/stable/1.22.0/dart-convert/Utf8Codec/Utf8Codec.html

const Utf8Codec({
bool allowMalformed: false
})
Instantiates a new Utf8Codec.

The optional allowMalformed argument defines how decoder (and decode) deal with invalid or unterminated character sequences.

If it is true (and not overridden at the method invocation) decode and the decoder replace invalid (or unterminated) octet sequences with the Unicode Replacement character U+FFFD (�). Otherwise they throw a FormatException.

For testing I used malformed examples from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt. All is ok but 5.1 Single UTF-16 surrogates and 5.2 Paired UTF-16 surrogates

  Utf8Codec codec = new Utf8Codec(allowMalformed: false);
  // Single UTF-16 surrogates
  print(codec.decode([0xED, 0xA0, 0x80])); // �
  // Paired UTF-16 surrogates
  print(codec.decode([0xED, 0xA0, 0x80, 0xED, 0xB0, 0x80])); // 𐀀

FormatException be thrown here

Please note, that if allowMalformed is true, then Unicode Replacement character U+FFFD should be used

  Utf8Codec codec = new Utf8Codec(allowMalformed: true);
  // Single UTF-16 surrogates
  print(codec.decode([0xED, 0xA0, 0x80]).codeUnits); // prints [55296] which is U+D800, but should be U+FFFD

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-core-librarySDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.core-alibrary-converttype-bugIncorrect behavior (everything from a crash to more subtle misbehavior)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions