Skip to content

Utf8Decoder should be compatible with TextDecoder.decode #31370

Closed
@rakudrama

Description

@rakudrama

The browser's TextDecoder.prototype.decode treats surrogates (U+D800 through U+DFFF) differently to Utf8Decoder.
This makes it difficult to use TextDecoder to accelerate conversion.
Acceleration is highly desirable - it improves one binary protobuf benchmark by 8x.

The main difference is that Utf8Decoder converts surrogates into a code point, but TextDecoder considers a surrogate to be an error and, depending on the fatal option, either throws an error, or decodes the surrogate to U+FFFD REPLACEMENT CHARACTER.

It is not possible to get acceptable performance for allowMalformed: true by trying with {fatal: true} and catching the Error and re-decoding with the slow code. Throwing the error is ~1000x more expensive.

Everything would be simpler if Utf8Decoder was completely aligned with TextDecoder.decode.

I have also verified that for other malformed inputs, TextDecoder and Utf8Decoder disagree on the number of U+FFFD replacements generated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-core-librarySDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.library-convert

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions