Utf8Decoder should be compatible with TextDecoder.decode

The browser's `TextDecoder.prototype.decode` treats surrogates (`U+D800` through `U+DFFF`) differently to Utf8Decoder.
This makes it difficult to use TextDecoder to accelerate conversion.
Acceleration is highly desirable - it improves one binary protobuf benchmark by 8x.

The main difference is that Utf8Decoder converts surrogates into a code point, but TextDecoder considers a surrogate to be an error and, depending on the `fatal` option, either throws an error, or decodes the surrogate to U+FFFD REPLACEMENT CHARACTER.

It is not possible to get acceptable performance for `allowMalformed: true` by trying with `{fatal: true}` and catching the Error and re-decoding with the slow code. Throwing the error is ~1000x more expensive.

Everything would be simpler if `Utf8Decoder` was completely aligned with `TextDecoder.decode`.

I have also verified that for other malformed inputs, TextDecoder and Utf8Decoder disagree on the number of U+FFFD replacements generated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utf8Decoder should be compatible with TextDecoder.decode #31370

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Utf8Decoder should be compatible with TextDecoder.decode #31370

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions